字符串沒有正確檢查停止詞

-1

我正在讀取文件中的停止詞，我將它保存在HashSet中。我比較說HashSet與String檢查停用詞。字符串沒有正確檢查停止詞

如果我在String-變量中放置了一個單詞，例如「the」，那麼我的輸出是「Yes」。但是，如果我將「Apple is it」或「它是一個蘋果」這樣的東西輸出爲「No」，儘管String -variables包含停用詞。

這裏的整個程序，包含兩種方法，一種用於讀取文件和一個用於去除停止詞：

private static HashSet<String> readFile(){ 
    Scanner x = null; 
    HashSet<String> hset = new HashSet<String>(); 

    try { 
     x = new Scanner(new File("StopWordsEnglish")); 
     while(x.hasNext()){ 
      hset.add(x.next()); 
     } 
    } catch(Exception e) { 
     e.printStackTrace(); 
    } finally { 
     x.close(); 
    } 
    return hset; 
} 

public static void removeStopWords(){ 
    HashSet<String> hset = readFile(); 
    System.out.println(hset.size()); 
    System.out.println("Enter a word to search for: "); 
    String search = "is"; 
    String s = search.toLowerCase(); 
    System.out.println(s); 

    if (hset.contains(s)) { 
     System.out.println("Yes"); 
    } else { 
     System.out.println("No"); 
    } 
}

來源

2017-05-16 R. Haroon

使用調試器，並發現它在空間 – Jens

我有一種感覺，我無法正確讀取你的問題。但是在這裏。

假設：

String search = "it is an apple";

那麼你或許應該拆分字符串，並逐個檢查每個單詞。

String[] split = search.split(" "); 
for (String s : split) { 
if (hset.contains(s.toLowerCase()) { 
    System.out.println("Yes"); 
    break; //no need to continue if a stop word is found 
} else { 
    System.out.println("No"); 
}

來源

2017-05-16 08:31:56 user2980932

拆分聽起來像是不錯的和適當的事情在這種情況下做的，但我想補充一點，符號化可能是一個困難和微妙的問題，例如：HTTPS： //www.tutorialspoint.com/opennlp/opennlp_tokenization.htm – hugh

字符串沒有正確檢查停止詞

回答

相關問題