Utf-8到URI百分比編碼

我試圖將Unicode代碼點轉換爲百分比編碼的UTF-8代碼單元。Utf-8到URI百分比編碼

Unicode-> UTF-8轉換似乎正常工作，正如一些測試顯示的印地語和中文字符在使用UTF-8編碼的Notepad ++中正確顯示，並且可以正確轉換回來。

我認爲編碼的百分比就像在每個UTF-8代碼單元前面添加'％'一樣簡單，但這不起作用。而不是預期的％E5％84％A3，我看到％xE5％x84％xA3（對於unicode U + 5123）。

enter image description here

我在做什麼錯？

增加的代碼（注意utf8.h屬於UTF8-CPP庫）。

#include <fstream> 
#include <iostream> 
#include <vector> 
#include "utf8.h" 

std::string unicode_to_utf8_units(int32_t unicode) 
{ 
    unsigned char u[5] = {0,0,0,0,0}; 
    unsigned char *iter = u, *limit = utf8::append(unicode, u); 
    std::string s; 
    for (; iter != limit; ++iter) { 
     s.push_back(*iter); 
    } 
    return s; 
} 

int main() 
{ 
    std::ofstream ofs("test.txt", std::ios_base::out); 
    if (!ofs.good()) { 
     std::cout << "ofstream encountered a problem." << std::endl; 
     return 1; 
    } 

    utf8::uint32_t unicode = 0x5123; 
    auto s = unicode_to_utf8_units(unicode); 
    for (auto &c : s) { 
     ofs << "%" << c; 
    } 

    ofs.close(); 

    return 0; 
}

來源

2013-10-06 A.B.

你真的使用字符0-9和A-F來編碼代碼單元嗎？在任何隨機代碼單元之前，只需添加一個百分號即可避免百分比轉義。 – rightfold

不清楚爲什麼你的代碼生成「x」。我們看不到它。 –

@ not-rightfold我正在使用utf8cpp庫進行unicode - > utf8轉換，並且據我所知，它工作正常。 –

你真正需要的字節值轉換成對應的ASCII字符串，例如：

"é"在UTF-8是價值{ 0xc3, 0xa9 }。請不要在C++中使用這些字節，值爲char。

每個字節需要分別轉換爲："%C3"和"%C9"。

這樣做的最好方法是使用sstream：

std::ostringstream out; 
std::string utf8str = "\xE5\x84\xA3"; 

for (int i = 0; i < utf8str.length(); ++i) { 
    out << '%' << std::hex << std::uppercase << (int)(unsigned char)utf8str[i]; 
}

還是在C++ 11：

for (auto c: utf8str) { 
    out << '%' << std::hex << std::uppercase << (int)(unsigned char)c; 
}

請注意，字節需要被轉換爲int，因爲別的<<操作員將使用litteral二進制值。首先投射到unsigned char是必要的，因爲否則，符號位將傳播到int值，導致輸出負值，如FFFFFFE5。

來源

2013-10-06 17:51:34 SirDarius

當寫入輸出文件流時，你的代碼給了我結果0x28fd6c。 –

它確實工作正常，請參閱：http：//ideone.com/jIq1jf – SirDarius

是的，直接輸出到'ofs'的竅門。 –

Utf-8到URI百分比編碼

回答

相關問題