Thursday, May 16, 2013

Search Noun , Adjectives & Verbs from Text Using JAVA with Apache OpenNLP Parse

Description : 

Some time we need to parse the text from text paragraph or we need to search some nouns , verbs etc from the paragraph. To sort the nouns and verbs or etc from paragraph , we need to implement Natural Language Processing , which is he part of Artificial Intelligence. But there are number of tools are provide to perform the task , and these tools are also used with programming languages. There are several tools are provide by the OpenNLP to process the paragraph , according to requirements , here is the documentation , that define how to use OpenNLP in different conditions or requirements. Today's we show how to use Apache OpenNLP to process the paragraph with the help of Java Using Parse Technique .

Requirements :

  • Download the Apache OpenNLP jars from OpenNLP website. 
  • Download JDK 7 (From my side , this is tested in JDK 7)
  • Download parser-chunking tool from here (there are so many tools are here)

Following is the code of Program : 

import java.io.FileInputStream;
import java.io.InputStream;
import java.util.HashSet;
import java.util.Set;

import opennlp.tools.cmdline.parser.ParserTool;
import opennlp.tools.parser.Parse;
import opennlp.tools.parser.Parser;
import opennlp.tools.parser.ParserFactory;
import opennlp.tools.parser.ParserModel;

public class ParserTest {

 static Set<String> nounPhrases = new HashSet<>();
 static Set<String> adjectivePhrases = new HashSet<>();
 static Set<String> verbPhrases = new HashSet<>();
 private static String line = "The Moon is a barren, rocky world     without air and water. It has dark lava plain on its surface. " +
"The Moon is filled wit craters. It has no light of its own. It gets its light from the Sun. The Moo keeps changing its " +
"shape as it moves round the Earth. It spins on its axis in 27.3 days stars were named after the Edwin Aldrin were the " +
first ones to set their foot on the Moon on 21 July 1969 They reached the Moon in their space craft named Apollo II";
 public void getNounPhrases(Parse p) {
  if (p.getType().equals("NN") || p.getType().equals("NNS") ||  p.getType().equals("NNP") || p.getType().equals("NNPS")) {
        nounPhrases.add(p.getCoveredText());
  }

  if (p.getType().equals("JJ") || p.getType().equals("JJR") || p.getType().equals("JJS")) {
    adjectivePhrases.add(p.getCoveredText());
  }
   
  if (p.getType().equals("VB") || p.getType().equals("VBP") || p.getType().equals("VBG")|| p.getType().equals("VBD") || p.getType().equals("VBN")) {
    verbPhrases.add(p.getCoveredText());
   }
   
  for (Parse child : p.getChildren()) {
        getNounPhrases(child);
  }
}
public void parserAction() throws Exception {
 InputStream is = new FileInputStream("en-parser-chunking.bin");
 ParserModel model = new ParserModel(is);
 Parser parser = ParserFactory.create(model);
 Parse topParses[] = ParserTool.parseLine(line, parser, 1);
 for (Parse p : topParses){
  //p.show();
  getNounPhrases(p);
 }
}

 public static void main(String[] args) throws Exception {
  new ParserTest().parserAction();
  System.out.println("List of Noun Parse : "+nounPhrases);
  System.out.println("List of Adjective Parse : "+adjectivePhrases);
  System.out.println("List of Verb Parse : "+verbPhrases);
 }
}

in this program the "NN","NNP" etc are the code for finding Nouns , Adjective , Verbs etc . Here are the list of all codes.
Download The Example Code From here

19 comments:

  1. May I know how to extract noun phrase/ verb phrase/ prepostional phrase when using chunker/("en-chunker.bin")?

    ReplyDelete
    Replies
    1. these are the codes to extract noun , verb etc
      http://bulba.sdsu.edu/jeanette/thesis/PennTags.html

      Delete
  2. This comment has been removed by the author.

    ReplyDelete
  3. elo0o... am really new to open nlp!! i actually copy paste ur codes in eclipse ..add the external jars but still am getting an error :
    "Exception in thread "main" java.lang.NoClassDefFoundError: opennlp/model/DataReader
    at opennlp.tools.util.model.BaseModel.createArtifactSerializers(BaseModel.java:343)
    at opennlp.tools.util.model.BaseModel.createBaseArtifactSerializers(BaseModel.java:374)
    at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:211)
    at opennlp.tools.util.model.BaseModel.(BaseModel.java:181)
    at opennlp.tools.parser.ParserModel.(ParserModel.java:152)
    at ParserTest.parserAction(ParserTest.java:47)
    at ParserTest.main(ParserTest.java:57)
    Caused by: java.lang.ClassNotFoundException: opennlp.model.DataReader
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    ... 7 more
    "
    i would be most grateful if you coud please help me :) thank you

    ReplyDelete
    Replies
    1. you open nlp jar's are no class path ?

      Delete
    2. Please extends the parserTest class in new class and write main method there. use inheritance

      Delete
  4. This comment has been removed by the author.

    ReplyDelete
  5. I get this exception in eclipse Android
    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Unknown Source)
    at java.lang.String.(Unknown Source)
    at java.io.DataInputStream.readUTF(Unknown Source)
    at java.io.DataInputStream.readUTF(Unknown Source)
    at opennlp.maxent.io.BinaryGISModelReader.readUTF(Unknown Source)
    at opennlp.maxent.io.SuffixSensitiveGISModelReader.readUTF(Unknown Source)
    at opennlp.maxent.io.GISModelReader.getPredicates(Unknown Source)
    at opennlp.maxent.io.GISModelReader.getModel(Unknown Source)

    ReplyDelete
  6. How to open a text file and then how to find find nouns and adjectives in it?
    Plzz help

    ReplyDelete
  7. Hello
    for the above program when i intialise
    private static String line ="ANDROID android Android";
    i get the output as
    (TOP (NP (DT android) (NN android) (CC android)))

    if i put only android as line value it doesnt identify the word but when i put "ANDROID android " it identify the second android only (though it is case senstive) same for every word .
    also i have put in line=line.toLowerCase(); to ignore case .

    private static String line ="synful Synful";
    (TOP (NP (JJ synful) (NN synful)))
    List of Noun Parse : [synful]
    List of Adjective Parse : [synful]
    List of Verb Parse : []

    why the output is different .

    ReplyDelete
    Replies
    1. Hello Monica,

      The text search is depend on "OpenNLP tools ". There are so many there and i am not sure, which one you use. So , please verify.

      Delete
  8. I need noun phrases like "The Moon", "Convention centre". But it splitts into two words...My name is Regnath Franco. It splitts my name. How to get this??
    Please help me...

    ReplyDelete
  9. Hello harmeet,

    I am currently using the opennlptool mentioned by you i.e http://opennlp.sourceforge.net/models-1.5/ and using en-parser-chunking.bin file .

    Still i am facing the issues .

    Thanks Monica

    ReplyDelete
  10. Hello Harmeet,

    I got exceptions when I run this program. Added te external jars. Exceptions are as follows.

    Exception in thread "main" java.lang.NoClassDefFoundError: opennlp/model/DataReader
    at opennlp.tools.util.model.BaseModel.createArtifactSerializers(BaseModel.java:343)
    at opennlp.tools.util.model.BaseModel.createBaseArtifactSerializers(BaseModel.java:374)
    at opennlp.tools.util.model.BaseModel.loadModel(BaseModel.java:211)
    at opennlp.tools.util.model.BaseModel.(BaseModel.java:181)
    at opennlp.tools.parser.ParserModel.(ParserModel.java:152)
    at ParserTest.parserAction(ParserTest.java:37)
    at ParserTest.main(ParserTest.java:46)
    Caused by: java.lang.ClassNotFoundException: opennlp.model.DataReader
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

    Am not able to resolve it.... Please help me.


    ReplyDelete
    Replies
    1. Hi Swedha,

      Before running you program, please ensure openlp jar files are on classpath and you are using write jars. Because this error are come, when classes are not on class path or your jars may be conflict with other or you are not using appropriate jars.

      Delete
    2. Hi Harmeet,

      Thank you for your reply. I able to fix the problem. The program code executed and obtained the output..Thanks a lot.

      Delete
  11. Hi

    in android studio with jar file : "opennlp-tools-1.5.3.jar"

    getting error:

    java.lang.NoClassDefFoundError: opennlp.model.GenericModelReader
    opennlp.tools.util.model.GenericModelSerializer.create(GenericModelSerializer.java:35)

    ReplyDelete
  12. Hi Harmeet,

    Thanks for nice example.

    ReplyDelete
  13. Hi Harmeet,

    I have implemented this code and also imported the jar files in Android App. The app is showing result but it takes 3-4 minutes to give the result and also the APK size becomes 38.6 MB. Also, the app is running on a single device. Please suggest some solution.

    ReplyDelete