允許用戶在PHP中提交HTML

我想允許很多用戶提交用戶配置文件的html，我目前試圖過濾出我不想要的內容，但我現在想要更改並使用白名單方法。允許用戶在PHP中提交HTML

這裏是我當前的非白名單的方式

function FilterHTML($string) { 
    if (get_magic_quotes_gpc()) { 
     $string = stripslashes($string); 
    } 
    $string = html_entity_decode($string, ENT_QUOTES, "ISO-8859-1"); 
    // convert decimal 
    $string = preg_replace('/&#(\d+)/me', "chr(\\1)", $string); // decimal notation 
    // convert hex 
    $string = preg_replace('/&#x([a-f0-9]+)/mei', "chr(0x\\1)", $string); // hex notation 
    //$string = html_entity_decode($string, ENT_COMPAT, "UTF-8"); 
    $string = preg_replace('#(&\#*\w+)[\x00-\x20]+;#U', "$1;", $string); 
    $string = preg_replace('#(<[^>]+[\s\r\n\"\'])(on|xmlns)[^>]*>#iU', "$1>", $string); 
    //$string = preg_replace('#(&\#x*)([0-9A-F]+);*#iu', "$1$2;", $string); //bad line 
    $string = preg_replace('#/*\*()[^>]*\*/#i', "", $string); // REMOVE /**/ 
    $string = preg_replace('#([a-z]*)[\x00-\x20]*([\`\'\"]*)[\\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iU', '...', $string); //JAVASCRIPT 
    $string = preg_replace('#([a-z]*)([\'\"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iU', '...', $string); //VBSCRIPT 
    $string = preg_replace('#([a-z]*)[\x00-\x20]*([\\\]*)[\\x00-\x20]*@([\\\]*)[\x00-\x20]*i([\\\]*)[\x00-\x20]*m([\\\]*)[\x00-\x20]*p([\\\]*)[\x00-\x20]*o([\\\]*)[\x00-\x20]*r([\\\]*)[\x00-\x20]*t#iU', '...', $string); //@IMPORT 
    $string = preg_replace('#([a-z]*)[\x00-\x20]*e[\x00-\x20]*x[\x00-\x20]*p[\x00-\x20]*r[\x00-\x20]*e[\x00-\x20]*s[\x00-\x20]*s[\x00-\x20]*i[\x00-\x20]*o[\x00-\x20]*n#iU', '...', $string); //EXPRESSION 
    $string = preg_replace('#</*\w+:\w[^>]*>#i', "", $string); 
    $string = preg_replace('#</?t(able|r|d)(\s[^>]*)?>#i', '', $string); // strip out tables 
    $string = preg_replace('/(potspace|pot space|rateuser|marquee)/i', '...', $string); // filter some words 
    //$string = str_replace('left:0px; top: 0px;','',$string); 
    do { 
     $oldstring = $string; 
     //bgsound| 
     $string = preg_replace('#</*(applet|meta|xml|blink|link|script|iframe|frame|frameset|ilayer|layer|title|base|body|xml|AllowScriptAccess|big)[^>]*>#i', "...", $string); 
    } while ($oldstring != $string); 
    return addslashes($string); 
}

上述工作得很好，我從未有過2年後用，但對於白名單的方式使用的任何問題，有什麼同類者到stackoverflows C＃方法，但在PHP中？ http://refactormycode.com/codes/333-sanitize-html

來源

2009-09-04 JasonDavis

HTML Purifier是符合標準的HTML 過濾庫用PHP編寫的。 HTML過濾不僅將移除所有惡意代碼（更好地稱爲XSS）與徹底的審計，安全又寬鬆的白名單，這也將確保您的文檔符合標準，只有實現了全面東西瞭解W3C的規範。

來源

2009-09-04 01:52:22 raspi

使用PHP，這是真正的路要走。它的輸出是驚人的和安全的。 – DGM 2009-09-04 04:01:22

我以前見過這個，但我覺得它確實很笨重，不過，我會再次檢查一遍，謝謝 – JasonDavis 2009-09-04 14:27:14

在我需要的東西上搜索大約半小時，直到我遇到你的帖子！：-）謝謝 – 2017-08-09 11:05:59

也許用DOMDocument正確分析它比較安全，用removeChild（）去掉不允許的標籤然後得到結果。用正則表達式過濾東西並不總是安全的，特別是如果事情開始變得如此複雜。黑客可以找到一種方法來欺騙你的過濾器，論壇和社交網絡都知道這一點。

例如，瀏覽器忽略<之後的空格。您的正則表達式篩選器<腳本，但如果我使用<腳本...大失敗！

來源

2009-09-04 01:47:16 Havenard

-1

實現這個目標非常簡單 - 您只需檢查任何不是來自列入白名單的標記列表中的某些標記並將其從源代碼中刪除即可。它可以用一個正則表達式很容易地完成。

function sanitize($html) { 
    $whitelist = array(
    'b', 'i', 'u', 'strong', 'em', 'a' 
); 

    return preg_replace("/<(^".implode("|", $whitelist).")(.*)>(.*)<\/(^".implode("|", $whitelist).")>/", "", $html); 
}

我還沒有測試過這個，那裏可能有一個錯誤，但你得到了它的工作原理。您可能也想看看使用格式化語言（如Textile或Markdown）。

Jamie

來源

2009-09-04 01:48:36

HTML Purifier是最好的HTML解析器/清理工。

來源

2009-09-04 01:49:55 MiffTheFox

您可以只使用strip_tags（）函數

由於函數定義爲

string strip_tags (string $str [, string $allowable_tags ])

你可以這樣做：

$html = $_POST['content']; 
$html = strip_tags($html, '<b><a><i><u><span>');

不過，要注意的是使用用strip_tags ，您將無法過濾屬性。例如

<a href="javascript:alert('haha caught cha!');">link</a>

來源

2009-09-04 03:25:36 mauris

試試這個功能「getCleanHTML」下面，從與白名單中的標籤名稱元素異常的元素中提取文本內容。這段代碼很乾淨，易於理解和調試。

<?php 

$TagWhiteList = array(
    'b', 'i', 'u', 'strong', 'em', 'a', 'img' 
); 

function getHTMLCode($Node) { 
    $Document = new DOMDocument();  
    $Document->appendChild($Document->importNode($Node, true)); 
    return $Document->saveHTML(); 
} 
function getCleanHTML($Node, $Text = "") { 
    global $TagWhiteList; 

    $TextName = $Node->tagName; 
    if ($TextName == null) 
     return $Text.$Node->textContent; 

    if (in_array($TextName, $TagWhiteList)) 
     return $Text.getHTMLCode($Node); 

    $Node = $Node->firstChild; 
    if ($Node != null) 
     $Text = getCleanHTML($Node, $Text); 

    while($Node->nextSibling != null) { 
     $Text = getCleanHTML($Node->nextSibling, $Text); 
     $Node = $Node->nextSibling; 
    } 
    return $Text; 
} 

$Doc = new DOMDocument(); 
$Doc->loadHTMLFile("Test.html"); 
echo getCleanHTML($Doc->documentElement)."\n"; 

?>

希望這會有所幫助。

來源

2009-09-04 03:38:32 NawaMan

對於那些建議只使用strip_tags的人...請注意：strip_tags不會去掉標籤屬性，並且破碎的標籤也會將其搞亂。

從手冊頁：

警告由於用strip_tags（）實際上不驗證HTML，局部的，或可導致去除更多的文本/數據的破碎的標籤比預期的。

警告此功能不會修改標籤上的任何屬性，你允許使用allowable_tags，包括風格和的onmouseover屬性，一個調皮的用戶可能會濫用時將顯示其他發佈文字用戶。

你不能只依賴這一個解決方案。

來源

2009-09-04 16:27:28

允許用戶在PHP中提交HTML

回答

相關問題