PHP的html_entity_decode和修剪混淆

我試圖用strip_tags和trim來檢測一個字符串是否包含空html？PHP的html_entity_decode和修剪混淆

$description = '<p>&nbsp;</p>'; 

$output = trim(strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8'))); 

var_dump($output);

字符串 'A'（長度= 2）

我的調試嘗試算出這個：

$description = '<p>&nbsp;</p>'; 

$test = mb_detect_encoding($description); 
$test .= "\n"; 
$test .= trim(strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8'))); 
$test .= "\n"; 
$test .= html_entity_decode($description, ENT_QUOTES, 'UTF-8'); 

file_put_contents('debug.txt', $test);

輸出：DEBUG.TXT

ASCII 
  
<p> </p>

來源

2015-11-03 John Magnolia

如果您使用var_dump(urlencode($output))你會看到它輸出string(6) "%C2%A0"，因此charcodes是0xC2和0xA0。 These two charcodes are unicode for "non-breaking-space"。確保您的文件以UTF-8格式保存，並且您的HTTP標頭爲UTF-8格式。

也就是說，修剪這個字符，你可以使用正則表達式與Unicode的修飾符（而不是裝飾）：

DEMO：

<?php 

$description = '<p>&nbsp;</p>'; 

$output = trim(strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8'))); 

var_dump(urlencode($output)); // string(6) "%C2%A0" 

// ------- 

$output = preg_replace('~^\s+|\s+$~', '', strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8'))); 

var_dump(urlencode($output)); // string(6) "%C2%A0" 

// ------- 

$output = preg_replace('~^\s+|\s+$~u', '', strip_tags(html_entity_decode($description, ENT_QUOTES, 'UTF-8'))); 
// Unicode! -----------------------^ 

var_dump(urlencode($output)); // string(0) ""

正則表達式屍檢：

~ - 正則表達式修飾符分隔符 - 必須在正則表達式之前，然後在修飾符之前
^\s+ - 緊接着的一個或多個空格（在字符串的開始的一個或多個空格字符）的字符串的開始 - （^表示字符串的開始，\s意味着一個空白字符，+指「匹配1至無窮大時間「）
| - OR
\s+$ - 結束 - 隨後立即串的端部（一端或在字符串的末尾）更多空白字符
~的一個或多個空格字符正則表達式修飾符分隔符
u - 正則表達式修飾符 - 在這裏使用unicode modifier (PCRE_UTF8)來確保我們替換unicode空白字符。

來源

2015-11-03 12:07:28 h2ooooooo

屍體解剖在這方面是一個很棒的詞。 – Martin

PHP的html_entity_decode和修剪混淆

回答

相關問題