PHP的正則表達式和preg_replace問題

我正在瀏覽別人的舊代碼，並有一些麻煩理解它。PHP的正則表達式和preg_replace問題

他：

explode(' ', strtolower(preg_replace('/[^a-z0-9-]+/i', ' ', preg_replace('/\&#?[a-z0-9]{2,4}\;/', ' ', preg_replace('/<[^>]+>/', ' ', $texts)))));

我認爲第一個正則表達式排除a-z和0-9，我不知道第二個正則表達式做什麼，但。第三個匹配'< >'裏面任何東西，除了'>'

結果將輸出，並在$texts變量的每一個字的陣列，但是，我只是不知道如何代碼產生這樣。我明白了什麼preg_replace等功能做什麼，只是不知道如何處理工作

來源

2013-03-19 FlyingCat

這許多嵌套的preg_replace電話僅僅是將導致混亂 – Scuzzy 2013-03-19 23:30:51

它分解成三個獨立的語句，使用臨時變量的處理順序。然後它變得更容易遵循。 – mario 2013-03-19 23:31:15

表達/[^a-z0-9-]+/i將匹配（並隨後與空白代替）的任何字符除了 A-Z和0-9。 ^ in [^...]表示否定其中包含的字符集。

[^a-z0-9]任何非字母數字字符
+指一種或多種的前述
/i使得它匹配不區分大小寫

表達/\&#?[a-z0-9]{2,4}\;/匹配&隨後任選地匹配#，後面是兩到四個字母和數字，以結尾這將match HTML entities like 或'

&#?比賽要麼因爲?&或&#，使前#可選&實際上並不需要逃跑。
[a-z0-9]{2,4}兩個和四個字母數字字符匹配
;是文字分號。它實際上並不需要轉義。

部分是因爲你懷疑，最後一個將取代像<tagname>或<tagname attr='value'>或</tagname>任何代碼與一個空的空間。請注意，它與整個標籤相匹配，而不僅僅是<>的內部內容。

<是文字字符
[^>]+是每個字符直到但不包括下一個>
>是文字字符

我真的建議重寫這三個單獨的呼叫到preg_replace()而不是嵌套它們。

// Strips tags. 
// Would be better done with strip_tags()!! 
$texts = preg_replace('/<[^>]+>/', ' ', $texts); 
// Removes HTML entities 
$texts = preg_replace('/&#?[a-z0-9]{2,4};/', ' ', $texts); 
// Removes remainin non-alphanumerics 
$texts = preg_replace('/[^a-z0-9-]+/i', ' ', $texts); 
$array = explode(' ', $texts);

來源

2013-03-19 23:30:57

...匹配一個'＆'，後面可以跟'＃'？ – 2013-03-19 23:32:43

@JanTuroň已經被claraified。 – 2013-03-19 23:33:16

這段代碼看起來像它...

條HTML/XML標籤
那麼任何與&或&＃開始，爲2-4（任何<和>之間）字符長（字母數字）
然後剝離任何非字母數字或破折號的東西

在嵌套

/<[^>]+>/ 

Match the character 「<」 literally «<» 
Match any character that is NOT a 「>」 «[^>]+» 
    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
Match the character 「>」 literally «>» 


/\&#?[a-z0-9]{2,4}\;/ 

Match the character 「&」 literally «\&» 
Match the character 「#」 literally «#?» 
    Between zero and one times, as many times as possible, giving back as needed (greedy) «?» 
Match a single character present in the list below «[a-z0-9]{2,4}» 
    Between 2 and 4 times, as many times as possible, giving back as needed (greedy) «{2,4}» 
    A character in the range between 「a」 and 「z」 «a-z» 
    A character in the range between 「0」 and 「9」 «0-9» 
Match the character 「;」 literally «\;» 


/[^a-z0-9-]+/i 

Options: case insensitive 

Match a single character NOT present in the list below «[^a-z0-9-]+» 
    Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+» 
    A character in the range between 「a」 and 「z」 «a-z» 
    A character in the range between 「0」 and 「9」 «0-9» 
    The character 「-」 «-»

來源

2013-03-19 23:34:10 Scuzzy

PHP的正則表達式和preg_replace問題

回答

相關問題