DOCX編碼問題

我有一個PHP腳本，它從MySQL數據庫中讀取信息，並使用模板將其放入一個DOCX文件中。在模板中，有佔位符稱爲<<<variable_name>>>其中variable_name是MySQL字段的名稱。DOCX編碼問題

DOCX文件是Zip檔案，所以我的PHP腳本使用ZipArchive庫打開DOCX並編輯document.xml文件，用正確的數據替換佔位符。

這工作得很好，直到今天，當我遇到一些編碼問題。任何非ANSI字符都不能正確編碼，並使輸出DOCX損壞。 MS Word提供錯誤消息「非法XML字符」。

當我解壓縮文檔並在記事本++中打開document.xml時，我可以看到有問題的字符。通過進入編碼菜單，並選擇「ANSI編碼」，我可以正常看到字符：它們是英鎊（£）符號。當N ++設置爲「以UTF-8編碼時，它們顯示爲十六進制值。

通過選擇」轉換爲UTF-8「的N ++選項，UTF-8中的字符顯示OK，MS Word打開文檔但我不想在每次創建時手動解壓我的DOCX壓縮文件 - 腳本的全部要點是生成文檔快捷方便。以UTF-8，使「£」字符出現正確

我的代碼（從另一個問題部分複製於SO）：

if (!copy($source, $target)) // make a duplicate so we dont overwrite the template 
    print "Could not duplicate template.\n"; 
$zip = new ZipArchive(); 
if ($zip->open($target, ZIPARCHIVE::CHECKCONS) !== TRUE) 
    print "Source is not a docx.\n"; 
$content_file = substr($source, -4) == '.odt' ? 'content.xml' : 'word/document.xml'; 
$file_contents = $zip->getFromName($content_file); 

// Code here to process the file, get list of substitutions to make 

foreach ($matches[0] as $x => $variable) 
{ 
    $find[$x] = '/' . $matches[0][$x] . '/'; 
    $replace[$x] = $$matches[1][$x];<br>\n"; 
} 
$file_contents = preg_replace($find, $replace, $file_contents, -1, $count); 

$zip->deleteName($content_file); 
$zip->addFromString($content_file, $file_contents); 
$zip->close(); 

chmod($target, 0777);

我曾嘗試：

$file_contents = iconv("Windows-1252", "UTF-8", $file_contents);

和：

$file_contents_utf8 = utf8_encode($file_contents_utf8);

，試圖讓PHP腳本以UTF-8編碼文件。

如何使用ZipArchive庫在保存時將PHP腳本編碼爲UTF-8文件？

來源

2016-08-11 harry_p_6

請勿使用任何轉換功能;隨處使用utf8。

讓我們來看看，你真的有UTF8 - 在PHP中使用bin2hex()功能，它適用於所謂包含£字符串，你應該看到C2A3，這是UTF8十六進制£。

來源

2016-08-13 20:10:48

DOCX編碼問題

回答

相關問題