2017-06-05 87 views
0

我想初始化一次stanfordNLP pipelince,並多次使用它,而不需要再次初始化,以提高執行時間。如何初始化stanfordNLP管道一次並多次使用而不需要再次初始化?

可能嗎?

我有代碼:

public static boolean isHeaderMatched(String string) { 

    // creates a StanfordCoreNLP object. 
    Properties props = new Properties(); 
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner"); 

    RedwoodConfiguration.current().clear().apply(); 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 

    Env env = TokenSequencePattern.getNewEnv(); 
    env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE); 
    env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE); 

    Annotation document = new Annotation(string); 

    // use the pipeline to annotate the document we created 
    pipeline.annotate(document); 
    List<CoreMap> sentences = document.get(SentencesAnnotation.class); 

    CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(env, "./app/utils/Summarizer/mapping/career_objective.rule", "./app/utils/Summarizer/mapping/personal_info.rule", "./app/utils/Summarizer/mapping/education.rule", "./app/utils/Summarizer/mapping/work_experience.rule", "./app/utils/Summarizer/mapping/certification.rule", "./app/utils/Summarizer/mapping/publication.rule", "./app/utils/Summarizer/mapping/award_achievement.rule", "./app/utils/Summarizer/mapping/hobbies_interest.rule", "./app/utils/Summarizer/mapping/lang_known.rule", "./app/utils/Summarizer/mapping/project_details.rule", "./app/utils/Summarizer/mapping/skill-set.rule", "./app/utils/Summarizer/mapping/misc_header.rule"); 

    boolean flag = false; 
    for (CoreMap sentence : sentences) { 
     List<MatchedExpression> matched = extractor.extractExpressions(sentence); 
     //System.out.println("Probable Header is : " + matched); 
     Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched); 
     System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size()); 

     //checked if the more than half the no. of word in header(string) is matched 
     if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) { 
       //System.out.println("This is sure a header!"); 
      flag = true; 
     } else { 
      flag = false; 
     } 
    /*for(MatchedExpression phrase: matched){ 
    System.out.println("matched header type: " + phrase.getValue().get()); 
    }*/ 
    } 
    return flag; 
} 

我要執行這部分代碼僅在上述方法中加載模型的第一呼叫被執行。

// creates a StanfordCoreNLP object. 
    Properties props = new Properties(); 
    props.put("annotators", "tokenize, ssplit, pos, lemma, ner"); 

    RedwoodConfiguration.current().clear().apply(); 
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props); 

    Env env = TokenSequencePattern.getNewEnv(); 
    env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE); 
    env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE); 

在此先感謝。

+0

您可以將'StanfordCoreNLP'變量從函數的作用域移動到類的作用域,並將要執行的代碼放入'static {}'塊中。 –

+0

如果我將最後提到的代碼塊放在'static {}'塊中,那麼在靜態方法中沒有檢測到兩個變量'pipeline'和'env'。 –

回答

2

以下是你可以做什麼的一個示例:

public class Example { 
    private static StanfordCoreNLP pipeline; 
    private static Env env; 

    static { 
     // creates a StanfordCoreNLP object. 
     Properties props = new Properties(); 
     props.put("annotators", "tokenize, ssplit, pos, lemma, ner"); 

     RedwoodConfiguration.current().clear().apply(); 
     pipeline = new StanfordCoreNLP(props); 

     env = TokenSequencePattern.getNewEnv(); 
     env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE); 
     env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE); 
    } 

    public static boolean isHeaderMatched(String string) { 
     Annotation document = new Annotation(string); 

     // use the pipeline to annotate the document we created 
     pipeline.annotate(document); 
     List<CoreMap> sentences = document.get(SentencesAnnotation.class); 

     CoreMapExpressionExtractor extractor = CoreMapExpressionExtractor.createExtractorFromFiles(env, "./app/utils/Summarizer/mapping/career_objective.rule", "./app/utils/Summarizer/mapping/personal_info.rule", "./app/utils/Summarizer/mapping/education.rule", "./app/utils/Summarizer/mapping/work_experience.rule", "./app/utils/Summarizer/mapping/certification.rule", "./app/utils/Summarizer/mapping/publication.rule", "./app/utils/Summarizer/mapping/award_achievement.rule", "./app/utils/Summarizer/mapping/hobbies_interest.rule", "./app/utils/Summarizer/mapping/lang_known.rule", "./app/utils/Summarizer/mapping/project_details.rule", "./app/utils/Summarizer/mapping/skill-set.rule", "./app/utils/Summarizer/mapping/misc_header.rule"); 

     boolean flag = false; 
     for (CoreMap sentence : sentences) { 
      List<MatchedExpression> matched = extractor.extractExpressions(sentence); 
      //System.out.println("Probable Header is : " + matched); 
      Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched); 
      System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size()); 

      // checked if the more than half the no. of word in header(string) is matched 
      if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) { 
       flag = true; 
      } else { 
       flag = false; 
      } 

     } 

     return flag; 
    } 

} 

在上面的代碼被加載的類時static塊將被執行。如果你不希望這種行爲則允許訪問的init方法,如下所示:

public class Example { 
    private static StanfordCoreNLP pipeline; 
    private static Env env; 

    public static init() { 
     // creates a StanfordCoreNLP object. 
     Properties props = new Properties(); 
     props.put("annotators", "tokenize, ssplit, pos, lemma, ner"); 

     RedwoodConfiguration.current().clear().apply(); 
     pipeline = new StanfordCoreNLP(props); 

     env = TokenSequencePattern.getNewEnv(); 
     env.setDefaultStringMatchFlags(NodePattern.CASE_INSENSITIVE); 
     env.setDefaultStringPatternFlags(Pattern.CASE_INSENSITIVE); 
    } 

    public static boolean isHeaderMatched(String string) { 
     // code left out for brevity 
    } 

} 

可以從另一個類調用使用:

Example.init(); 
Example.isHeaderMatched("foobar"); 

在寫這個答案我注意到你的邏輯可能存在缺陷。下面的代碼可能不會產生你想要的行爲。

boolean flag = false; 
for (CoreMap sentence : sentences) { 
    List<MatchedExpression> matched = extractor.extractExpressions(sentence); 
    //System.out.println("Probable Header is : " + matched); 
    Set<String> uniqueMatchedKeyWordSet = DocumentParserUtil.removeDuplicate(matched); 
    System.out.println("Matched: " + uniqueMatchedKeyWordSet + " and Size of MatchedSet: " + uniqueMatchedKeyWordSet.size()); 

    // checked if the more than half the no. of word in header(string) is matched 
    if ((matched.size() >= uniqueMatchedKeyWordSet.size()) && !matched.isEmpty() && matched.size() >= Math.floorDiv(string.split("\\s").length, 2)) { 
     flag = true; 
    } else { 
     flag = false; 
    } 

} 

你的List<CoreMap>收集sentences在遍歷每個CoreMap。每次迭代你都將flag設置爲條件的結果,這就是問題所在。布爾型flag將只反映最後的sentence運行通過條件的結果。如果你需要知道每個sentence的結果,那麼你應該有一個布爾值列表來跟蹤結果,否則刪除循環並檢查最後一句(因爲這就是你的循環所做的)。