因此我試圖在C中編寫一個比較函數,它可以採用UTF-8編碼的Unicode字符串並使用Windows CompareStringEx()函數,我期望它能像.NET一樣工作CultureInfo.CompareInfo.Compare()。比較C語言中的Unicode字符串比C#返回不同的值
現在我用C編寫的函數在一些時間工作,但不是在所有情況下,我試圖找出原因。這是一個失敗的情況下(通過在C#中,不C):
CultureInfo cultureInfo = new CultureInfo("en-US");
CompareOptions compareOptions = CompareOptions.IgnoreCase | CompareOptions.IgnoreKanaType | CompareOptions.IgnoreWidth;
string stringA = "คนอ้วน ๆ";
string stringB = "はじめまして";
//Result is -1 which is expected
int result = cultureInfo.CompareInfo.Compare(stringA, stringB);
這裏是我自己寫的C.請記住,這是應該採取UTF-8編碼的字符串,並使用Windows CompareStringEx()函數使得轉換是必要的。
// Compare flags for the string comparison
#define COMPARE_STRING_FLAGS (NORM_IGNORECASE | NORM_IGNOREKANATYPE | NORM_IGNOREWIDTH)
int CompareStrings(int lenA, const void *strA, int lenB, const void *strB)
{
LCID ENGLISH_LCID = MAKELCID(MAKELANGID(LANG_ENGLISH, SUBLANG_ENGLISH_US), SORT_DEFAULT);
int compareString = -1;
// Get the size of the strings as UTF-18 encoded Unicode strings.
// Note: Passing 0 as the last parameter forces the MultiByteToWideChar function
// to give us the required buffer size to convert the given string to utf-16s
int strAWStrBufferSize = MultiByteToWideChar(CP_UTF8, 0, (LPCSTR)strA, lenA, NULL, 0);
int strBWStrBufferSize = MultiByteToWideChar(CP_UTF8, 0, (LPCSTR)strB, lenB, NULL, 0);
// Malloc the strings to store the converted UTF-16 values
LPWSTR utf16StrA = (LPWSTR) GlobalAlloc(GMEM_FIXED, strAWStrBufferSize * sizeof(WCHAR));
LPWSTR utf16StrB = (LPWSTR) GlobalAlloc(GMEM_FIXED, strBWStrBufferSize * sizeof(WCHAR));
// Convert the UTF-8 strings (SQLite will pass them as UTF-8 to us) to standard
// windows WCHAR (UTF-16\UCS-2) encoding for Unicode so they can be used in the
// Windows CompareStringEx() function.
if(strAWStrBufferSize != 0)
{
MultiByteToWideChar(CP_UTF8, 0, (LPCSTR)strA, lenA, utf16StrA, strAWStrBufferSize);
}
if(strBWStrBufferSize != 0)
{
MultiByteToWideChar(CP_UTF8, 0, (LPCSTR)strB, lenB, utf16StrB, strBWStrBufferSize);
}
// Compare the strings using the windows compare function.
// Note: We subtract 1 from the size since we don't want to include the null termination character
if(NULL != utf16StrA && NULL != utf16StrB)
{
compareValue = CompareStringEx(L"en-US", COMPARE_STRING_FLAGS, utf16StrA, strAWStrBufferSize - 1, utf16StrB, strBWStrBufferSize - 1, NULL, NULL, 0);
}
// In the Windows CompareStringEx() function, 0 indicates an error, 1 indicates less than,
// 2 indicates equal to, 3 indicates greater than so subtract 2 to maintain C convention
if(compareValue > 0)
{
compareValue -= 2;
}
return compareValue;
}
現在,如果我運行下面的代碼,我希望得到的結果是-1基於.NET實現(見上文),但我得到1表明該字符串是大於:
char strA[50] = "คนอ้วน ๆ";
char strB[50] = "はじめまして";
// Will be 1 when we expect it to be -1
int result = CompareStrings(strlen(strA), strA, strlen(strB), strB);
關於爲什麼我得到的結果有什麼不同?我在這兩個實現中都使用了相同的LCID/cultureInfo和compareOptions,就我所知,轉換是成功的。
僅供參考:此函數將用作SQLite中的自定義歸類。與問題無關,但如果有人想知道爲什麼函數簽名是這樣的話。
更新:我還確定,當在.NET 4中運行相同的代碼時,我會看到我在本機代碼中看到的行爲。因此,現在.NET版本之間存在差異。看到我的回答下面的原因背後。
與您的問題沒有關係,但是您正在泄漏記憶。 – dalle
'CompareStrings'是否可能將字節視爲某些8位代碼頁中的字符,並應用排序規則而不是比較字節值?我希望Windows的這種破壞行爲... –
@dalle:我如何泄漏內存?對此代碼的任何增強都會很感激。 –