2011-01-11 139 views
3

我開頭的字符串包含一個編碼的Unicode字符「ü」。我將字符串傳遞給執行一些邏輯並返回另一個字符串的對象。該字符串將原始編碼字符轉換爲與其相同的Unicode編碼「ü」。如何將unicode字符轉換爲c#中的轉義ascii等效字符#

我需要得到原始編碼的字符回來,但到目前爲止我無法。

我嘗試過使用HttpUtility.HtmlEncode()方法,但它返回「ü」,這是不一樣的。

任何人都可以幫忙嗎?

+0

在哪些方面是他們不一樣呢? – 2011-01-11 23:08:35

回答

4

他們幾乎是一樣的,至少在顯示的目的。 HttpUtility.HtmlEncode正在使用十進制編碼,格式爲&#DECIMAL;,而您的原始版本編碼爲hexadecimal,即格式爲&#xHEX;。由於十進制中的fc十進制爲252,因此兩者相等。

如果您確實需要獲取十六進制編碼版本,那麼在將其填充回&#xHEX;格式之前,請考慮解析出小數點和converting it to hex。類似於

string unicode = "ü"; 
string decimalEncoded = HttpUtility.HtmlEncode(unicode); 
int decimal = int.Parse(decimalEncoded.Substring(2, decimalEncoded.Length - 3); 
string hexEncoded = string.Format("&#x{0:X};", decimal); 
0

我只是在昨天的日子裏解決了這個問題。

這比看單個字符要複雜一點。你需要推出你自己的HtmlEncode()方法。 .Net世界中的字符串是UTF-16編碼的。 Unicode代碼點(HTML數字字符引用標識的內容)是一個32位無符號整數值。這主要是一個問題,你必須處理Unicodes「基本多語言平面」以外的角色。

此代碼應該做你想要什麼

using System; 
using System.Configuration ; 
using System.Globalization ; 
using System.Collections.Generic ; 
using System.Text; 


namespace TestDrive 
{ 
    class Program 
    { 
     static void Main() 
     { 
      string src = "foo \uABC123 bar" ; 
      string converted = HtmlEncode(src) ; 

      return ; 
     } 

     static string HtmlEncode(string s) 
     { 
      // 
      // In the .Net world, strings are UTF-16 encoded. That means that Unicode codepoints greater than 0x007F 
      // are encoded in the string as 2-character digraphs. So to properly turn them into HTML numeric 
      // characeter references (decimal or hex), we first need to get the UTF-32 encoding. 
      // 
      uint[]  utf32Chars = StringToArrayOfUtf32Chars(s) ; 
      StringBuilder sb   = new StringBuilder(2000) ; // set a reasonable initial size for the buffer 

      // iterate over the utf-32 encoded characters 
      foreach (uint codePoint in utf32Chars) 
      { 

       if (codePoint > 0x0000007F) 
       { 
        // if the code point is greater than 0x7F, it gets turned into an HTML numerica character reference 
        sb.AppendFormat("&#x{0:X};" , codePoint) ; // hex escape sequence 
        //sb.AppendFormat("&#{0};" , codePoint) ; // decimal escape sequence 
       } 
       else 
       { 
        // if less than or equal to 0x7F, it goes into the string as-is, 
        // except for the 5 SGML/XML/HTML reserved characters. You might 
        // want to also escape all the ASCII control characters (those chars 
        // in the range 0x00 - 0x1F). 

        // convert the unit to an UTF-16 character 
        char ch = Convert.ToChar(codePoint) ; 

        // do the needful. 
        switch (ch) 
        { 
        case '"' : sb.Append("""  ) ; break ; 
        case '\'' : sb.Append("'"  ) ; break ; 
        case '&' : sb.Append("&"  ) ; break ; 
        case '<' : sb.Append("&lt;"  ) ; break ; 
        case '>' : sb.Append("&gt;"  ) ; break ; 
        default : sb.Append(ch.ToString()) ; break ; 
        } 
       } 
      } 

      // return the escaped, utf-16 string back to the caller. 
      string encoded = sb.ToString() ; 
      return encoded ; 
     } 

     /// <summary> 
     /// Convert a UTF-16 encoded .Net string into an array of UTF-32 encoding Unicode chars 
     /// </summary> 
     /// <param name="s"></param> 
     /// <returns></returns> 
     private static uint[] StringToArrayOfUtf32Chars(string s) 
     { 
      Byte[] bytes  = Encoding.UTF32.GetBytes(s) ; 
      uint[] utf32Chars = (uint[]) Array.CreateInstance(typeof(uint) , bytes.Length/sizeof(uint)) ; 

      for (int i = 0 , j = 0 ; i < bytes.Length ; i += 4 , ++j) 
      { 
       utf32Chars[ j ] = BitConverter.ToUInt32(bytes , i) ; 
      } 

      return utf32Chars ; 
     } 




    } 

} 

希望這有助於!

0

或者你可以試試這個代碼:

using System; 
using System.Collections.Generic; 
using System.Linq; 
using System.Text; 
using System.Web; 
using System.Configuration; 
using System.Globalization; 

namespace SimpleCGIEXE 
{ 
    class Program 
    { 
     static string Uni2Html(string src) 
     { 
      string temp1 = HttpUtility.UrlEncodeUnicode(src); 
      string temp2 = temp1.Replace('+', ' '); 
      string res = string.Empty; 
      int pos1 = 0, pos2 = 0; 
      while (true){ 
       pos2=temp2.IndexOf("%",pos1); 
       if (pos2 < 0) break; 
       if (temp2[pos2 + 1] == 'u') 
       { 
        res += temp2.Substring(pos1, pos2 - pos1); 
        res += "&#x"; 
        res += temp2.Substring(pos2 + 2, 4); 
        res += ";"; 
        pos1 = pos2 + 6; 
       } 
       else 
       { 
        res += temp2.Substring(pos1, pos2 - pos1); 
        string stASCII = temp2.Substring(pos2 + 1, 2); 
        byte[] pdASCII = new byte[1]; 
        pdASCII[0] = byte.Parse(stASCII, System.Globalization.NumberStyles.AllowHexSpecifier); 
        res += Encoding.ASCII.GetString(pdASCII); 
        pos1 = pos2 + 3; 
       } 
      } 
      res += temp2.Substring(pos1); 
      return res; 
     } 
     static void Main(string[] args) 
     { 
      Console.WriteLine("Content-type: text/html;charset=utf-8\r\n"); 
      String st = "Vietnamese string: Thử một xâu unicode @@ # ~ .^ % !"; 
      Console.WriteLine(Uni2Html(st) + "<br>"); 
      st = "A chinese string: 我愛你 (I love you)"; 
      Console.WriteLine(Uni2Html(st) + "<br>"); 
     } 
    } 
} 
+0

你可以通過一些解釋給你的答案改善它。謝謝。 – 2013-09-14 19:14:28

相關問題