找出大量特殊字符的字符串數

-1

我需要找出字符串中的字數。但是這個字符串不是正常類型的字符串。它有很多特殊的字符，如<，/ em，/ p等等。所以在stackoverflow中使用的大多數方法都不起作用。因此我需要自己定義一個正則表達式。找出大量特殊字符的字符串數

我打算做的是使用正則表達式定義什麼是單詞，並計算單詞出現的次數。這是我如何定義一個詞。它必須以字母開頭，並以此結尾：或或！要麼？或'或 - 或）或。或「

這是我如何定義我的正則表達式

pattern = Pattern.compile("^[a-zA-Z](:|,|!|?|'|-|)|.|")$"); 
matcher = pattern.matcher(line); 
while (matcher.find()) 
wordCount++;

但是有錯誤的第一行

pattern = Pattern.compile("^[a-zA-Z](:|,|!|?|'|-|)|.|")$");

我怎樣才能解決這個問題呢？

來源

2016-09-23 xiaoxin

轉義雙引號。 '「^ [a-zA-Z]（：|，|！| | |'| - |）|。| \」）$「' – Tushar

向我們展示示例輸入和輸出:) – TheLostMind

我很困惑，你顯示'「^ [a-zA-Z]（：|，|！| | |'| - |）|。」「）$」'但是你提到'「^ a-zA-Z |。|」）$「' - 你試圖使用哪一個？ –

這是否幫助？

String line = "so.this:is,what)you!wanted?"; 
    int wordCount = 0; 
    Pattern pattern = Pattern.compile("([a-zA-Z]++[:'-,\\.!\\?\")]{1})"); 
    Matcher matcher = pattern.matcher(line); 
    while (matcher.find()) 
     wordCount++; 
    System.out.println(wordCount); // Prints 6

來源

2016-09-23 08:05:03

事實上，你也想刪除標籤，如<em>（HTML強調），否則將被視爲單詞。如果再考慮全面的標籤與屬性： <span font="Consolas">那麼它是更容易去除標籤：

public int static wordCount(String s) { 
    s.replaceAll("<[A-Za-z/][^>]*>", " ") // Tags as space 
     .replaceAll("[^\\p{L}\\p{M}\\d]+", " ") // Non-letters, -accents, -digits as blank 
     .trim() // Not before or after (empty words) 
     .split(" ").length; 
}

這是相當低效，和的replaceAll修剪。至少預編譯和使用模式會更好。但可能不值得。

來源

2016-09-23 08:26:57

找出大量特殊字符的字符串數

回答

相關問題