以二進制模式將utf16寫入文件

我正嘗試在二進制模式下寫入帶有ofstream的wstring文件，但我認爲我做錯了什麼。這是我已經試過：在例如火狐以二進制模式將utf16寫入文件

ofstream outFile("test.txt", std::ios::out | std::ios::binary); 
wstring hello = L"hello"; 
outFile.write((char *) hello.c_str(), hello.length() * sizeof(wchar_t)); 
outFile.close();

開幕的test.txt與編碼設置爲UTF-16，它會顯示爲：

你好

有誰能告訴我爲什麼會發生這種情況？

編輯：

打開文件中的十六進制編輯器，我得到：

FF FE 68 00 00 00 65 00 00 00 6C 00 00 00 6C 00 00 00 6F 00 00 00

看起來像我的每個人物出於某種原因之間得到兩個額外的字節？

來源

2008-10-16 Cactuar

添加方面與流從wchar_t的做轉換到正確的輸出相關的地方。見下文。 – 2008-10-16 13:01:42

我懷疑sizeof（wchar_t）在你的環境中是4，即它寫出UTF-32/UCS-4而不是UTF-16。這當然是十六進制轉儲的樣子。

這很容易測試（只是打印出sizeof（wchar_t）），但我很確定這是怎麼回事。

要從UTF-32 wstring轉換爲UTF-16，您需要應用正確的編碼，因爲代理對會發揮作用。

來源

2008-10-16 07:47:34

是的，你是正確的wchar_t的大小爲4，我在mac。因此，這解釋了很多:)我知道UTF-16中的代理對，將不得不進一步研究。 – Cactuar 2008-10-16 08:01:10

從輸出中你不能告訴它它是UTF-16或UTF-32，它顯示的只是wchar_t是4個字節寬。字符串的編碼不是由語言定義的（儘管它最可能是UCS-4）。 – 2008-10-16 13:10:30

您應該在十六進制編輯器（如WinHex）中查看輸出文件，以便查看實際位和字節，以驗證輸出實際上是UTF-16。張貼在這裏，讓我們知道結果。這將告訴我們是否應該責怪Firefox或您的C++程序。

但是在我看來，像您的C++程序一樣工作，Firefox並沒有正確解釋您的UTF-16。 UTF-16爲每個字符調用兩個字節。但Firefox是印刷兩倍多的字符，因爲它應該，所以它可能是試圖解釋你的字符串爲UTF-8或ASCII，一般只需要每個字符1個字節。

當你說「Firefox編碼設置爲UTF16」你是什麼意思？我懷疑這項工作是否奏效。

來源

2008-10-16 07:30:13

在這裏，我們遇到了很少使用的區域設置屬性。如果你的輸出字符串作爲一個字符串（而不是原始數據），你可以得到的語言環境做適當的轉換自動神奇。

N.B.此代碼沒有考慮到wchar_t的字符的帳戶edianness。

#include <locale> 
#include <fstream> 
#include <iostream> 
// See Below for the facet 
#include "UTF16Facet.h" 

int main(int argc,char* argv[]) 
{ 
    // construct a custom unicode facet and add it to a local. 
    UTF16Facet *unicodeFacet = new UTF16Facet(); 
    const std::locale unicodeLocale(std::cout.getloc(), unicodeFacet); 

    // Create a stream and imbue it with the facet 
    std::wofstream saveFile; 
    saveFile.imbue(unicodeLocale); 


    // Now the stream is imbued we can open it. 
    // NB If you open the file stream first. Any attempt to imbue it with a local will silently fail. 
    saveFile.open("output.uni"); 
    saveFile << L"This is my Data\n"; 


    return(0); 
}

該文件：UTF16Facet.h

#include <locale> 

class UTF16Facet: public std::codecvt<wchar_t,char,std::char_traits<wchar_t>::state_type> 
{ 
    typedef std::codecvt<wchar_t,char,std::char_traits<wchar_t>::state_type> MyType; 
    typedef MyType::state_type   state_type; 
    typedef MyType::result    result; 


    /* This function deals with converting data from the input stream into the internal stream.*/ 
    /* 
    * from, from_end: Points to the beginning and end of the input that we are converting 'from'. 
    * to, to_limit: Points to where we are writing the conversion 'to' 
    * from_next:  When the function exits this should have been updated to point at the next location 
    *     to read from. (ie the first unconverted input character) 
    * to_next:   When the function exits this should have been updated to point at the next location 
    *     to write to. 
    * 
    * status:   This indicates the status of the conversion. 
    *     possible values are: 
    *     error:  An error occurred the bad file bit will be set. 
    *     ok:   Everything went to plan 
    *     partial: Not enough input data was supplied to complete any conversion. 
    *     nonconv: no conversion was done. 
    */ 
    virtual result do_in(state_type &s, 
          const char *from,const char *from_end,const char* &from_next, 
          wchar_t  *to, wchar_t *to_limit,wchar_t* &to_next) const 
    { 
     // Loop over both the input and output array/ 
     for(;(from < from_end) && (to < to_limit);from += 2,++to) 
     { 
      /*Input the Data*/ 
      /* As the input 16 bits may not fill the wchar_t object 
      * Initialise it so that zero out all its bit's. This 
      * is important on systems with 32bit wchar_t objects. 
      */ 
      (*to)        = L'\0'; 

      /* Next read the data from the input stream into 
      * wchar_t object. Remember that we need to copy 
      * into the bottom 16 bits no matter what size the 
      * the wchar_t object is. 
      */ 
      reinterpret_cast<char*>(to)[0] = from[0]; 
      reinterpret_cast<char*>(to)[1] = from[1]; 
     } 
     from_next = from; 
     to_next  = to; 

     return((from > from_end)?partial:ok); 
    } 



    /* This function deals with converting data from the internal stream to a C/C++ file stream.*/ 
    /* 
    * from, from_end: Points to the beginning and end of the input that we are converting 'from'. 
    * to, to_limit: Points to where we are writing the conversion 'to' 
    * from_next:  When the function exits this should have been updated to point at the next location 
    *     to read from. (ie the first unconverted input character) 
    * to_next:   When the function exits this should have been updated to point at the next location 
    *     to write to. 
    * 
    * status:   This indicates the status of the conversion. 
    *     possible values are: 
    *     error:  An error occurred the bad file bit will be set. 
    *     ok:   Everything went to plan 
    *     partial: Not enough input data was supplied to complete any conversion. 
    *     nonconv: no conversion was done. 
    */ 
    virtual result do_out(state_type &state, 
          const wchar_t *from, const wchar_t *from_end, const wchar_t* &from_next, 
          char   *to, char   *to_limit, char*   &to_next) const 
    { 
     for(;(from < from_end) && (to < to_limit);++from,to += 2) 
     { 
      /* Output the Data */ 
      /* NB I am assuming the characters are encoded as UTF-16. 
      * This means they are 16 bits inside a wchar_t object. 
      * As the size of wchar_t varies between platforms I need 
      * to take this into consideration and only take the bottom 
      * 16 bits of each wchar_t object. 
      */ 
      to[0]  = reinterpret_cast<const char*>(from)[0]; 
      to[1]  = reinterpret_cast<const char*>(from)[1]; 

     } 
     from_next = from; 
     to_next  = to; 

     return((to > to_limit)?partial:ok); 
    } 
};

來源

2008-10-16 12:56:58

請注意，您的Facet實現到UCS-2而不是UTF-16的轉換。 UTF-16是一種可變長度編碼，稱爲代理對的儀器。 UCS-2是Unicode的一個子集，這就是UTF-16發明的原因。 – 2017-05-04 21:12:29

在使用wofstream和監守的wofstream轉換用值0A到2個字節0D 0A所有字節以上定義的UTF16面失敗窗口，這是不考慮你如何傳遞， '\ X0A' 的0A字節，L '\ X0A'，L '\ x000A'， '\ n'，L '\ n' 和std :: ENDL都給予同樣的結果。在Windows下你必須打開該文件以二進制方式使用ofstream（不是wofsteam）和寫輸出，就像它是在原崗位完成。

來源

2009-05-22 11:16:56

提供的Utf16Facet沒有在大字符串gcc中工作，這裏是我工作的版本...這種方式的文件將被保存在UTF-16LE。對於UTF-16BE，只需將do_in和do_out中的分配顛倒過來即可。 to[0] = from[1]和to[1] = from[0]

#include <locale> 
#include <bits/codecvt.h> 


class UTF16Facet: public std::codecvt<wchar_t,char,std::char_traits<wchar_t>::state_type> 
{ 
    typedef std::codecvt<wchar_t,char,std::char_traits<wchar_t>::state_type> MyType; 
    typedef MyType::state_type   state_type; 
    typedef MyType::result    result; 


    /* This function deals with converting data from the input stream into the internal stream.*/ 
    /* 
    * from, from_end: Points to the beginning and end of the input that we are converting 'from'. 
    * to, to_limit: Points to where we are writing the conversion 'to' 
    * from_next:  When the function exits this should have been updated to point at the next location 
    *     to read from. (ie the first unconverted input character) 
    * to_next:   When the function exits this should have been updated to point at the next location 
    *     to write to. 
    * 
    * status:   This indicates the status of the conversion. 
    *     possible values are: 
    *     error:  An error occurred the bad file bit will be set. 
    *     ok:   Everything went to plan 
    *     partial: Not enough input data was supplied to complete any conversion. 
    *     nonconv: no conversion was done. 
    */ 
    virtual result do_in(state_type &s, 
          const char *from,const char *from_end,const char* &from_next, 
          wchar_t  *to, wchar_t *to_limit,wchar_t* &to_next) const 
    { 

     for(;from < from_end;from += 2,++to) 
     { 
      if(to<=to_limit){ 
       (*to)        = L'\0'; 

       reinterpret_cast<char*>(to)[0] = from[0]; 
       reinterpret_cast<char*>(to)[1] = from[1]; 

       from_next = from; 
       to_next  = to; 
      } 
     } 

     return((to != to_limit)?partial:ok); 
    } 



    /* This function deals with converting data from the internal stream to a C/C++ file stream.*/ 
    /* 
    * from, from_end: Points to the beginning and end of the input that we are converting 'from'. 
    * to, to_limit: Points to where we are writing the conversion 'to' 
    * from_next:  When the function exits this should have been updated to point at the next location 
    *     to read from. (ie the first unconverted input character) 
    * to_next:   When the function exits this should have been updated to point at the next location 
    *     to write to. 
    * 
    * status:   This indicates the status of the conversion. 
    *     possible values are: 
    *     error:  An error occurred the bad file bit will be set. 
    *     ok:   Everything went to plan 
    *     partial: Not enough input data was supplied to complete any conversion. 
    *     nonconv: no conversion was done. 
    */ 
    virtual result do_out(state_type &state, 
          const wchar_t *from, const wchar_t *from_end, const wchar_t* &from_next, 
          char   *to, char   *to_limit, char*   &to_next) const 
    { 

     for(;(from < from_end);++from, to += 2) 
     { 
      if(to <= to_limit){ 

       to[0]  = reinterpret_cast<const char*>(from)[0]; 
       to[1]  = reinterpret_cast<const char*>(from)[1]; 

       from_next = from; 
       to_next  = to; 
      } 
     } 

     return((to != to_limit)?partial:ok); 
    } 
};

來源

2012-06-09 02:59:52

如果使用C++11標準（因爲有很多附加的包括像"utf8"它永遠解決了這個問題），這是很容易。

但是，如果你想使用多平臺的代碼與舊標準，您可以使用此方法與流寫：

Read the article about UTF converter for streams
來源添加stxutif.h到項目上面

以ANSI模式打開文件，並將BOM添加到文件的開頭，如下所示：

std::ofstream fs; 
fs.open(filepath, std::ios::out|std::ios::binary); 

unsigned char smarker[3]; 
smarker[0] = 0xEF; 
smarker[1] = 0xBB; 
smarker[2] = 0xBF; 

fs << smarker; 
fs.close();

然後打開該文件作爲UTF還有寫你的內容：

std::wofstream fs; 
fs.open(filepath, std::ios::out|std::ios::app); 

std::locale utf8_locale(std::locale(), new utf8cvt<false>); 
fs.imbue(utf8_locale); 

fs << .. // Write anything you want...

來源

2012-09-20 07:45:14

以二進制模式將utf16寫入文件

回答

相關問題