函數返回1個UTF-8字符？

我有前進1 UTF-8字符和返回的字節數花了到那裏的函數：函數返回1個UTF-8字符？

// Moves the iterator to next unicode character in the string, 
//returns number of bytes skipped 
template<typename _Iterator1, typename _Iterator2> 
inline size_t bringToNextUnichar(_Iterator1& it, 
    const _Iterator2& last) const { 
    if(it == last) return 0; 
    unsigned char c; 
    size_t res = 1; 
    for(++it; last != it; ++it, ++res) { 
     c = *it; 
     if(!(c&0x80) || ((c&0xC0) == 0xC0)) break; 
    } 

    return res; 
}

我怎麼能修改，這樣我可以從任意回去Unicode字符字符？

謝謝

來源

2011-02-11 jmasterx

只是遞減迭代器而不是增加它。

// Moves the iterator to previous unicode character in the string, 
//returns number of bytes skipped 
template<typename _Iterator1, typename _Iterator2> 
inline size_t bringToPrevUnichar(_Iterator1& it, 
    const _Iterator2& first) const { 
    if(it == first) return 0; 
    unsigned char c; 
    size_t res = 1; 
    for(--it; first != it; --it, ++res) { // Note: --it instead of ++it 
     c = *it; 
     if(!(c&0x80) || ((c&0xC0) == 0xC0)) break; 
    } 

    return res; 
}

來源

2011-02-11 01:28:50 Maz

Utf8可能需要超過1個字符。 – 2011-02-11 01:33:30

UTF-8開始字節是無論0xxxxxxx或11xxxxxx。 UTF-8流中沒有其他字節匹配這些字節。從這裏你可以設計一個功能boolean isStartByte(unsigned char c)。從那裏剩下的工作與C++迭代器無關。玩的開心。

來源

2011-02-11 01:34:14 rlibby

在UTF-8，有三種字節的...

0xxxxxxx : ASCII 
10xxxxxx : 2nd, 3rd or 4th byte of code 
11xxxxxx : 1st byte of multibyte code

於是後退一步，直到你找到一個0xxxxxxx或11xxxxxx字節。

來源

2011-02-11 01:35:30 Steve314

函數返回1個UTF-8字符？

回答

相關問題