加載並保存包含波蘭語字符的HTML文件

我需要加載HTML模板文件（使用std::ifstream），添加一些內容，然後將其保存爲完整的網頁。如果不是波蘭字符這將是很簡單 - 我已經試過的所有組合char/wchar_t，Unicode/Multi-Byte字符集，iso-8859-2/utf-8，ANSI/utf-8和他們沒有工作對我來說（總是有一些不正確顯示的字符（或者其中一些根本不顯示）加載並保存包含波蘭語字符的HTML文件

我可以在這裏粘貼很多代碼和文件，但我不確定這是否會有幫助，但也許你可以告訴我：什麼格式/編碼應該模板文件有，我應該在網頁中聲明什麼編碼，我應該如何加載並保存該文件以獲得正確結果？

（如果我的問題不夠具體，或者你做需要代碼/文件的例子，讓我知道。）

編輯：我已經試過庫建議的評論：

std::string fix_utf8_string(std::string const & str) 
{ 
    std::string temp; 
    utf8::replace_invalid(str.begin(), str.end(), back_inserter(temp)); 
    return str; 
}

致電：

fix_utf8_string("wynik działania pozytywny ąśżźćńłóę");

拋出：utf8::not_enough_room - 我做錯了什麼？

來源

2013-04-30 NPS

看看[這個]（http://utfcpp.sourceforge.net/）庫 – 2013-04-30 09:53:38

@ bash.d請查看編輯我的問題。 – NPS 2013-05-02 18:14:27

@ bash.d不幸的是，該庫根本不適用於我（即使沒有拋出異常，它仍然沒有正確地轉換字符）。 – NPS 2013-05-02 23:51:22

不知道這是（完美）的方式去，但下面的解決方案爲我工作！

我救了我的HTML模板文件爲ANSI（或至少這就是記事本++說的），改變了每一個寫到文件流操作：

file << std::string("some text with polish chars: ąśżźćńłóę");

到：

其中：

std::string ToUtf8(std::string ansiText) 
{ 
    int ansiRequiredSize = MultiByteToWideChar(1250, 0, ansiText.c_str(), ansiText.size(), NULL, 0); 
    wchar_t * wideText = new wchar_t[ansiRequiredSize + 1]; 
    wideText[ansiRequiredSize] = NULL; 
    MultiByteToWideChar(1250, 0, ansiText.c_str(), ansiText.size(), wideText, ansiRequiredSize); 
    int utf8RequiredSize = WideCharToMultiByte(65001, 0, wideText, ansiRequiredSize, NULL, 0, NULL, NULL); 
    char utf8Text[1024]; 
    utf8Text[utf8RequiredSize] = NULL; 
    WideCharToMultiByte(65001, 0, wideText, ansiRequiredSize, utf8Text, utf8RequiredSize, NULL, NULL); 
    delete [] wideText; 
    return utf8Text; 
}

的基本思想是利用MultiByteToWideChar()和WideCharToMultiByte()功能便利着想rt字符串從ANSI（多字節）到寬字符，然後從寬字符到utf-8（更多在這裏：http://www.chilkatsoft.com/p/p_348.asp）。最好的部分是 - 我不需要改變任何東西（即std::ofstream到std::wofstream或使用任何第三方庫或改變我實際使用文件流的方式（而不是將字符串轉換爲必要的utf-8））！

也許應該爲其他語言工作，雖然我沒有測試。

來源

2013-05-02 23:49:59 NPS

加載並保存包含波蘭語字符的HTML文件

回答

相關問題