將Unicode的UTF8表示寫入文件

我有一個專有文件（數據庫）格式，目前我正嘗試遷移到SQL數據庫。因此，我將文件轉換爲SQL轉儲，這已經正常工作。現在剩下的唯一問題是它們處理不在ASCII-decimal範圍32到126內的字符的奇怪方式。它們具有以Unicode存儲的所有這些字符的集合（十六進制 - 例如20AC =€），由它們自己索引內部索引。將Unicode的UTF8表示寫入文件

我現在的計劃是：我想創建一個表，其中存儲內部索引，unicode（十六進制）和字符表示形式（UTF-8）。此表格可以用於未來的更新。

現在到問題：我如何寫一個unicode十六進制值的UTF-8字符表示形式的文件？當前的代碼如下所示：

this->outFile.open(fileName + ".sql", std::ofstream::app); 
std::string protyp; 
this->inFile.ignore(2); // Ignore the ID = 01. 
std::getline(this->inFile, protyp); // Get the PROTYP Identifier (e.g. \321) 
protyp = "\\" + protyp; 

std::string unicodeHex; 
this->inFile.ignore(2); // Ignore the ID = 01. 
std::getline(this->inFile, unicodeHex); // Get the Unicode HEX Identifier (e.g. 002C) 

std::wstring_convert<std::codecvt_utf8<wchar_t>> converter; 
const std::wstring wide_string = this->s2ws("\\u" + unicodeHex); 
const std::string utf8_rep = converter.to_bytes(wide_string); 

std::string valueString = "('" + protyp + "', '" + unicodeHex + "', '" + utf8_rep + "')"; 

this->outFile << valueString << std::endl; 

this->outFile.close();

但這只是打印出這樣的事：

('\321', '002C', '\u002C'),

雖然所需的輸出將是：

('\321', '002C', ','),

我到底做錯了什麼？我不得不承認，當我提到字符編碼和其他東西時，我並不確定：/。如果它有任何區別，我正在使用Windows 7 64位。在此先感謝。

來源

2015-04-06 puelo

從'轉換採取\ u002C'到寬字符值出現在編譯的時候，不運行。你需要忘記'\ u'並做一個字符串到整數的轉換。 –

它的工作原理！非常感謝。如果你願意，你可以添加你的評論作爲答案。我會盡快接受它。我還會添加一些代碼作爲第二個答案來介紹我提出的解決方案。 – puelo

我的目標是給你足夠的信息來自己產生答案，看起來我成功了。您真誠的感謝是我需要的全部獎勵。 –

正如@Mark Ransom在評論中指出的，我最好的選擇是將十六進制字符串轉換爲整數並使用它。這是我做過什麼：

unsigned int decimalHex = std::stoul(unicodeHex, nullptr, 16);; 

std::string valueString = "('" + protyp + "', '" + unicodeHex + "', '" + this->UnicodeToUTF8(decimalHex) + "')";

雖然對於UnicodeToUTF8功能是從這裏Unsigned integer as UTF-8 value

std::string UnicodeToUTF8(unsigned int codepoint) 
{ 
    std::string out; 

    if (codepoint <= 0x7f) 
     out.append(1, static_cast<char>(codepoint)); 
    else if (codepoint <= 0x7ff) 
    { 
     out.append(1, static_cast<char>(0xc0 | ((codepoint >> 6) & 0x1f))); 
     out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f))); 
    } 
    else if (codepoint <= 0xffff) 
    { 
     out.append(1, static_cast<char>(0xe0 | ((codepoint >> 12) & 0x0f))); 
     out.append(1, static_cast<char>(0x80 | ((codepoint >> 6) & 0x3f))); 
     out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f))); 
    } 
    else 
    { 
     out.append(1, static_cast<char>(0xf0 | ((codepoint >> 18) & 0x07))); 
     out.append(1, static_cast<char>(0x80 | ((codepoint >> 12) & 0x3f))); 
     out.append(1, static_cast<char>(0x80 | ((codepoint >> 6) & 0x3f))); 
     out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f))); 
    } 
    return out; 
}

來源

2015-04-06 15:38:42 puelo

嘿，我認識到這個代碼！ –

哈！沒有注意到！偉大的代碼！ – puelo

將Unicode的UTF8表示寫入文件

回答

相關問題