file_get_contents（）分解UTF-8字符

我從外部服務器加載HTML。 HTML標記具有UTF-8編碼和包含的字符，例如L，S，C，T，Z等。當我加載HTML與file_get_contents（）函數是這樣的：file_get_contents（）分解UTF-8字符

$html = file_get_contents('http://example.com/foreign.html');

它弄亂的UTF-8字符和加載Å，¾，¤和類似的廢話，而不是正確的UTF-8字符。

我該如何解決這個問題？

UPDATE：

我都嘗試保存的HTML文件，並使用UTF-8編碼輸出它。兩者都不起作用，所以它意味着file_get_contents（）已經返回破碎的HTML。

UPDATE2：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sk" lang="sk"> 
<head> 

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
<meta http-equiv="Content-Style-Type" content="text/css" /> 
<meta http-equiv="Content-Language" content="sk" /> 
<title>Test</title> 

</head> 
<body> 


<?php 

$html = file_get_contents('http://example.com'); 
echo htmlentities($html); 

?> 

</body> 
</html>

來源

2010-02-10 Richard Knop

你使用UTF-8輸出它們嗎？ – 2010-02-10 12:23:43

你在哪裏查看加載的HTML？ – 2010-02-10 12:24:06

我不輸出它。我將它保存到一個文件然後讀取它。但它是無關的，因爲我試圖用UTF-8輸出它，它仍然搞砸了。 – 2010-02-10 12:24:57

好吧。我發現file_get_contents（）不會導致這個問題。我在另一個問題中談到另一個原因。傻我。

看到這個問題：Why Does DOM Change Encoding?

來源

2010-02-10 13:05:31

file_get_contents（）導致該問題。我有一個JSON文件，我用file_get_contents（）打開，但在加載JSON之後執行print_r（）時，unicode字符在那裏，但不在JSON中。在file_get_contents（）上執行mb_convert_encoding（）解決了問題。 – Reado 2017-05-09 09:02:57

'$ string = mb_convert_encoding（$ string，'HTML-ENTITIES'，「UTF-8」）;'爲我解決了它。 – WEBjuju 2018-02-24 22:16:01

function file_get_contents_utf8($fn) { 
    $content = file_get_contents($fn); 
     return mb_convert_encoding($content, 'UTF-8', 
      mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)); 
}

您也可以試試你的運氣與http://php.net/manual/en/function.mb-internal-encoding.php

來源

2010-02-10 12:26:46 Gordon

這個解決方案非常好，謝謝！ – brentonstrine 2013-10-26 03:41:33

這應該被標記爲最佳答案。謝謝戈登。 – helpse 2014-07-21 16:54:59

我認爲你只是有字符類型的雙轉換有：d

這可能是因爲您在HTML文檔中打開HTML文檔。所以，你有東西，看起來像這到底

<!DOCTYPE html> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
<title></title> 
</head> 
<body> 
<!DOCTYPE html> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
<title>Test</title>.......

採用mb_detect_encoding因此可能導致您的其他問題。

來源

2012-11-10 18:59:00

我曾與波蘭語言類似的問題

我想：

$fileEndEnd = mb_convert_encoding($fileEndEnd, 'UTF-8', mb_detect_encoding($fileEndEnd, 'UTF-8', true));

我想：

$fileEndEnd = utf8_encode ($fileEndEnd);

我想：

$fileEndEnd = iconv("UTF-8", "UTF-8", $fileEndEnd);

然後 -

$fileEndEnd = mb_convert_encoding($fileEndEnd, 'HTML-ENTITIES', "UTF-8");

這最後工作完美!!!!!!

來源

2013-03-03 08:20:40 ugniesdebesys

「HTML-ENTITIES」爲+1 – Raptor 2013-03-06 09:30:47

真棒，這爲我解決了它。 – 2013-04-30 11:32:41

你做了我的一天。 – vikingmaster 2014-04-13 23:04:41

試試這個太

$url = 'http://www.domain.com/'; 
    $html = file_get_contents($url); 

    //Change encoding to UTF-8 from ISO-8859-1 
    $html = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $html);

來源

2014-11-19 13:55:28 Mohamm6d

在土耳其語，mb_convert_encoding或其他任何字符集轉換沒有工作。

而且由於空間字符轉換爲+ char，urlencode也無法工作。對於百分比編碼，它必須是％20。

這一個工作！

$url = rawurlencode($url); 
    $url = str_replace("%3A", ":", $url); 
    $url = str_replace("%2F", "/", $url); 

    $data = file_get_contents($url);

來源

2016-10-26 08:24:31

我正在處理35000行數據。

$f=fopen("veri1.txt","r"); 
$i=0; 
while(!feof($f)){ 
    $i++; 
    $line=mb_convert_encoding(fgets($f), 'HTML-ENTITIES', "UTF-8"); 
    echo $line; 
}

此代碼將我的奇怪字符轉換爲正常。

來源

2017-11-15 10:49:54 matasoy

file_get_contents（）分解UTF-8字符

回答

相關問題