1
我想從查詢中獲取實體。如何針對自定義NameFinder模型進行OpenNLP培訓?
我有一個自定義的NameFinder模型。
查詢是這樣的。
result for roll number 1304510020.
result for roll-number 1304510020.
result for rollnumber 1304510020.
result of rollnumber 1304510020.
result of roll number 1304510020.
result of roll-number 1304510020.
roll number 1304510020 result.
rollnumber 1304510020 result.
roll-number 1304510020 result.
show result of roll number 1304510020.
show result of rollnumber 1304510020.
show result of roll-number 1304510020.
show my result for 1304510020.
result of 1304510020.
這是我的訓練碼
package nlpParser;
import java.io.BufferedOutputStream;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.Charset;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.NameSample;
import opennlp.tools.namefind.NameSampleDataStream;
import opennlp.tools.namefind.TokenNameFinderFactory;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.util.InputStreamFactory;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.PlainTextByLineStream;
import opennlp.tools.util.TrainingParameters;
public class Trainer {
\t // training data set
static String trainingPath =
\t \t "C:\\Users\\MujeebulHasan\\Desktop\\Project\\hbtu\\hbtuaiagent\\Source Code\\parser\\training\\";
public static void main(String[] args) throws IOException {
\t String[] entities = new String[]{"rollnumber","result"};
\t String[] pathsOfTraingFile = new String[]{"rollnumber\\rollnumber.train","result\\result.train"};
\t String[] pathsOfTrainedFile = new String[]{"rollnumber\\rollnumber.bin","result\\result.bin"};
\t
\t for(int i = 0; i < entities.length; i++){
\t \t final int j = i;
\t \t InputStreamFactory isf = new InputStreamFactory() {
\t \t public InputStream createInputStream() throws IOException {
\t \t return new FileInputStream(trainingPath+pathsOfTraingFile[j]);
\t \t }
\t \t };
\t \t Charset charset = Charset.forName("UTF-8");
\t \t ObjectStream<String> lineStream = new PlainTextByLineStream(isf, charset);
\t \t ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
\t \t TokenNameFinderModel model;
\t \t TokenNameFinderFactory nameFinderFactory = new TokenNameFinderFactory();
\t \t try {
\t \t model = NameFinderME.train("en", entities[i], sampleStream, TrainingParameters.defaultParams(),
\t \t nameFinderFactory);
\t \t } finally {
\t \t sampleStream.close();
\t \t }
\t \t BufferedOutputStream modelOut = null;
\t \t try {
\t \t modelOut = new BufferedOutputStream(new FileOutputStream(trainingPath+pathsOfTrainedFile[i]));
\t \t model.serialize(modelOut);
\t \t } finally {
\t \t if (modelOut != null)
\t \t modelOut.close();
\t \t }
\t }
}
}
rollnumber.train
result for roll number <START:rollnumber> 1304510020 <END> .
result for roll-number <START:rollnumber> 1304510020 <END> .
result for rollnumber <START:rollnumber> 1304510020 <END> .
result for roll <START:rollnumber> 1304510020 <END> .
result of rollnumber <START:rollnumber> 1304510020 <END> .
result of roll number <START:rollnumber> 1304510020 <END> .
result of roll-number <START:rollnumber> 1304510020 <END> .
result of roll <START:rollnumber> 1304510020 <END> .
roll number <START:rollnumber> 1304510020 <END> result.
rollnumber <START:rollnumber> 1304510020 <END> result.
roll-number <START:rollnumber> 1304510020 <END> result.
roll <START:rollnumber> 1304510020 <END> result.
show result of roll number <START:rollnumber> 1304510020 <END> .
show result of rollnumber <START:rollnumber> 1304510020 <END> .
show result of roll-number <START:rollnumber> 1304510020 <END> .
show result of roll <START:rollnumber> 1304510020 <END> .
show my result for <START:rollnumber> 1304510020 <END> .
result of <START:rollnumber> 1304510020 <END> .
result for <START:rollnumber> 1304510020 <END> .
what is my result for rollnumber <START:rollnumber> 1304510020 <END> .
what is my result of rollnumber <START:rollnumber> 1304510020 <END> .
what is my result for roll <START:rollnumber> 1304510020 <END> .
result.train
<START:result> result <END> for roll number 1304510020.
<START:result> result <END> for roll-number 1304510020.
<START:result> result <END> for rollnumber 1304510020.
<START:result> result <END> of rollnumber 1304510020.
<START:result> result <END> of roll number 1304510020.
<START:result> result <END> of roll-number 1304510020.
roll number 1304510020 <START:result> result <END> .
rollnumber 1304510020 <START:result> result <END> .
roll-number 1304510020 <START:result> result <END> .
show <START:result> result <END> of roll number 1304510020.
show <START:result> result <END> of rollnumber 1304510020.
show <START:result> result <END> of roll-number 1304510020.
show my <START:result> result <END> for 1304510020.
<START:result> result <END> of 1304510020.
當我使用此代碼進行測試。
package nlpParser;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Scanner;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.util.Span;
public class GetEntities {
\t public static void main(String[] args) throws IOException {
\t \t Scanner sc = new Scanner(System.in);
\t \t String query ="";
\t \t GetEntities obj = new GetEntities();
\t \t while((query = sc.nextLine()) != " "){
\t \t \t obj.parse(query);
\t \t }
\t \t sc.close();
\t }
\t public void parse(String query) throws IOException{
\t \t String[] entities = new String[]{"rollnumber","result"};
\t \t String[] pathsOfTrainedFile = new String[]{"rollnumber\\rollnumber.bin","result\\result.bin"};
\t \t
\t \t for(int i = 0 ; i < entities.length; i++){
\t \t \t //Loading the NER model
\t \t \t InputStream inputStream = new
\t \t \t FileInputStream("C:\\Users\\MujeebulHasan\\Desktop\\Project\\hbtu\\hbtuaiagent\\Source Code\\parser\\training\\"+pathsOfTrainedFile[i]);
\t \t \t TokenNameFinderModel model = new TokenNameFinderModel(inputStream);
\t \t \t //Instantiating the NameFinder class
\t \t \t NameFinderME nameFinder = new NameFinderME(model);
\t \t
\t \t \t \t //Finding the names in the sentence
\t \t \t System.out.println("Processing query... ");
\t \t \t System.out.print("Query = "+query);
\t \t \t \t query = query.replace(".", "");
\t \t \t \t String[] sentence = query.split(" ");
\t \t \t \t System.out.println();
\t \t \t \t System.out.println("RESULT :");
\t \t \t \t Span nameSpans[] = nameFinder.find(sentence);
\t \t \t \t //Printing the spans of the names in the sentence
\t \t \t \t for(Span s: nameSpans) {
\t \t \t \t \t System.out.println(s.toString());
\t \t \t \t \t System.out.println(sentence[s.getStart()]);
\t \t \t \t }
\t \t \t }
\t \t }
}
它提供了以下結果。有時候哪個是錯的。
result of roll number 1304510020
Processing query...
Query = result of roll number 1304510020
RESULT :
Processing query...
Query = result of roll number 1304510020
RESULT :
[0..1) result
result
show result for roll number 1304510020
Processing query...
Query = show result for roll number 1304510020
RESULT :
Processing query...
Query = show result for roll number 1304510020
RESULT :
[1..2) result
result
result for rollnumber 1304510020
Processing query...
Query = result for rollnumber 1304510020
RESULT :
[3..4) rollnumber
1304510020
Processing query...
Query = result for rollnumber 1304510020
RESULT :
[0..1) result
result
result 1304510020
Processing query...
Query = result 1304510020
RESULT :
Processing query...
Query = result 1304510020
RESULT :
[0..1) result
result
1304510020 result
Processing query...
Query = 1304510020 result
RESULT :
Processing query...
Query = 1304510020 result
RESULT :
[1..2) result
result