如何在HashSet中搜索時使用正則表達式

我正在編寫一個Java程序，我需要從Set中搜索特定的單詞。必須搜索的詞就像（「wo.d」），其中'。'可以用任何其他字母來代替。我正在使用正則表達式來匹配這種類型的單詞情況。如何在HashSet中搜索時使用正則表達式

這是我迄今爲止

HashSet<String> words = new HashSet<String>();//this set is already populated 
String word = "t.st"; 
if(word.contains(".")){ 
    Pattern p = Pattern.compile(word); 
    Matcher m; 
    boolean match = false; 
    for(String setWord : words){ 
     m = p.matcher(setWord); 
     if(m.matches()) 
      match = true; 
    } 
    if(match) 
     System.out.println("Its a match"); 
    else 
     System.out.println("Its not a match"); 
} 
else{ 
    System.out.println("The word does not contain regex do other stuff"); 
}

上述工程的代碼，但效率不高，因爲它被稱爲多次在一秒鐘。所以它會在程序中產生滯後。

來源

2016-09-24 Vipul Behl

是否有該組多個匹配的任何機會呢？ – Rehman

多重比賽是什麼意思？如果我使用正則表達式搜索單詞「w.od」，它會給出多個匹配的單詞「word＆wood」。 –

你需要停止迭代爲一旦你得到一個匹配，所以假設你使用Java 8，你的for循環可以被有效地重寫爲下一個：

boolean match = words.stream().anyMatch(w -> p.matcher(w).matches());

您還可以使用parallelStream()而不是stream()來並行化研究，特別是如果您的Set有很多單詞。

如果您不使用Java 7，仍然可以使用FluentIterable從Google Guava完成，但不幸的是不能並行化研究。

boolean match = FluentIterable.from(words).anyMatch(
    new Predicate<String>() { 
     @Override 
     public boolean apply(@Nullable final String w) { 
      return p.matcher(w).matches(); 
     } 
    } 
);

但在你的情況，我不認爲使用FluentIterable可以比簡單地增加一個break更有趣，當你得到一個比賽，因爲它仍然會更容易閱讀和維護

if (p.matcher(setWord).matches()) { 
    match = true; 
    break; 
}

所以，如果你真的需要使用正則表達式並且你不能使用Java 8，你最好的選擇是使用break，如上所述，沒有什麼魔術可以考慮。

假設你將只有一個字符替換，它可以用做startsWith(String)和endsWith(String)這將永遠是比一個正則表達式快得多。類似這樣的：

// Your words should be in a TreeSet to be already sorted alphabetically 
// in order to get a match as fast as possible 
Set<String> words = new TreeSet<String>(); //this set is already populated 
int index = word.indexOf('.'); 
if (index != -1) { 
    String prefix = word.substring(0, index); 
    String suffix = word.substring(index + 1); 
    boolean match = false; 
    for (String setWord : words){ 
     // From the fastest to the slowest thing to check 
     // to get the best possible performances 
     if (setWord.length() == word.length() 
      && setWord.startsWith(prefix) 
      && setWord.endsWith(suffix)) { 
      match = true; 
      break; 
     } 
    } 
    if(match) 
     System.out.println("Its a match"); 
    else 
     System.out.println("Its not a match"); 
} 
else { 
    System.out.println("The word does not contain regex do other stuff"); 
}

來源

2016-09-24 08:40:33

'words.stream（）。anyMatch（w - > p.matcher（w）.matches（））'。 – saka1029

@ saka1029你是完全正確的，許多thx的評論 –

這將工作，但不幸的是，我仍然與Java 7一起工作。 –

就退出循環，以儘快停止你HashSet進一步的正則表達式匹配，你獲得匹配：

if(m.matches()) { 
    match = true; 
    break; 
}

全碼：

HashSet<String> words = new HashSet<String>();//this set is already populated 
String word = "t.st"; 
if(word.contains(".")){ 
    Pattern p = Pattern.compile(word); 
    Matcher m; 
    boolean match = false; 
    for(String setWord : words){ 
     m = p.matcher(setWord); 
     if(m.matches()) { 
      match = true; 
      break: 
     } 
    } 
    if(match) 
     System.out.println("Its a match"); 
    else 
     System.out.println("Its not a match"); 
} 
else{ 
    System.out.println("The word does not contain regex do other stuff"); 
}

來源

2016-09-24 08:31:53 anubhava

我試過使用break，它確實提高了搜索速度。但我正在尋找比我寫的更好的解決方案。 –

什麼是你的設置的大小？只要匹配't.st'設置成千上萬的項目應該不成問題。 – anubhava

它接近100,000 –

使用TreeSet而不是HashSet。並測試該組的子範圍。

TreeSet<String> words = new TreeSet<>();// this set is already populated 
String word = "t.st"; 
if (word.contains(".")) { 
    String from = word.replaceFirst("\\..*", ""); 
    String to = from + '\uffff'; 
    Pattern p = Pattern.compile(word); 
    Matcher m; 
    boolean match = false; 
    for (String setWord : words.subSet(from, to)) { 
     m = p.matcher(setWord); 
     if (m.matches()) { 
      match = true; 
      break; 
     } 
    } 
    if (match) 
     System.out.println("Its a match"); 
    else 
     System.out.println("Its not a match"); 
} else { 
    System.out.println("The word does not contain regex do other stuff"); 
}

在這種情況下words.subSet(from, to)只包含以「t」開頭的單詞。

來源

2016-09-24 08:57:43 saka1029

使用這樣的原始匹配方法。

static boolean match(String wild, String s) { 
    int len = wild.length(); 
    if (len != s.length()) 
     return false; 
    for (int i = 0; i < len; ++i) { 
     char w = wild.charAt(i); 
     if (w == '.') 
      continue; 
     else if (w != s.charAt(i)) 
      return false; 
    } 
    return true; 
}

和

HashSet<String> words = new HashSet<>();// this set is already populated 
String word = "t.st"; 
boolean match = false; 
if (word.contains(".")) { 
    for (String setWord : words) { 
     if (match(word, setWord)) { 
      match = true; 
      break; 
     } 
    } 
    if (match) 
     System.out.println("Its a match"); 
    else 
     System.out.println("Its not a match"); 
} else { 
    System.out.println("The word does not contain regex do other stuff"); 
}

來源

2016-09-24 10:35:13 saka1029

如何在HashSet中搜索時使用正則表達式

回答

相關問題