正則表達式匹配

如何在文化獨立的方式下匹配單詞但不匹配字母？正則表達式匹配

\w匹配單詞或數字，但我想忽略數字。所以，「111或者這個」與\w\s將不起作用。

我想只得到「或這個」？我想{^[A-Za-z]+$}不是解決方案，因爲德語字母表有一些額外的字母。

來源

2011-11-27 Nickolodeon

應該將「or this」視爲一個還是兩個？ –

我想獲得模式「word1 word2」的匹配。請注意，「mark1是1」應該給我1匹配「mark1 is」。另外，「我的生日是11/08/2000」應該在「我的生日」和「生日是」中進行匹配（日期不應該匹配）。 – Nickolodeon

-1

我認爲正則表達式應該是[^ \ d \ s] +。即不是數字或空格字符。

來源

2011-11-27 19:12:33 bozdoz

這應該匹配單詞工作：

\b[^\d\s]+\b

擊穿：

\b - word boundary 
[ - start of character class 
^ - negation within character class 
\d - numerals 
\s - whitespace 
] - end of character class 
+ - repeat previous character one or more times 
\b - word boundary

這將匹配任何被明確排除數字和空格（所以「字」如「字邊界劃定aa？aa！aa「將被匹配）。

另外，如果您想排除這些，以及，你可以使用：

\b[\p{L}\p{M}]+\b

擊穿：

\b - word boundary 
[  - start of character class 
\p{L} - single code point in the category "letter" 
\p{M} - code point that is a combining mark (such as diacritics) 
]  - end of character class 
+  - repeat previous character one or more times 
\b - word boundary

來源

2011-11-27 19:12:47 Oded

良好的通話。我以前從未使用過單詞界限。現在我會。 :) – bozdoz

這也會匹配「aaa？」，「aaa！」，「aaa＃」等字樣。 – mifki

@mifki - 標點符號不匹配。您將需要使用除'\ b'以外的內容來包含這些內容。 – Oded

我會建議使用此：

foundMatch = Regex.IsMatch(SubjectString, @"\b[\p{L}\p{M}]+\b");

哪樣只匹配所有的unicode 字母。

雖然@ Oded的答案也可以工作，但它也與此匹配：p+ü+üü++üüü++ü這不完全是一個單詞。

說明：

" 
\b    # Assert position at a word boundary 
[\p{L}\p{M}] # Match a single character present in the list below 
        # A character with the Unicode property 「letter」 (any kind of letter from any language) 
        # A character with the Unicode property 「mark」 (a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.)) 
    +    # Between one and unlimited times, as many times as possible, giving back as needed (greedy) 
\b    # Assert position at a word boundary 
"

來源

2011-11-27 19:32:51 FailedDev

您還需要包含'\ p {M}'，因爲重音可能被編碼爲單獨的編碼點。 – mifki

@mifki +1感謝您的指點。 – FailedDev

使用此表達\b[\p{L}\p{M}]+\b。它使用不太知名的符號來匹配指定類別的Unicode字符（代碼點）。所以\p{L}將匹配所有字母，並且\p{M}將匹配所有組合標記。後者是必需的，因爲有時重音字符可能被編碼爲兩個代碼點（字母本身+組合標記），並且僅在這種情況下，\p{L}將僅匹配其中的一個。

另請注意，這是匹配可能包含國際字符的單詞的一般表達式。例如，如果您需要一次匹配多個單詞或允許以數字結尾的單詞，則必須相應地修改此模式。

來源

2011-11-27 19:45:46 mifki

+1。不知道\ p {M}訣竅:) – FailedDev

好的，但爲什麼？你應該總是試着解釋爲什麼你的解決方案在OP沒有的時候工作。在SO這裏，像這樣的駕駛式答案在這裏不受歡迎。 –

@AlanMoore我在我的評論中解釋了FailedDev的答案。我也會更新我的答案。 – mifki

正則表達式匹配

回答

相關問題