XMLParser的在權利要求菲羅U + 00A0是「無效UTF-8」

鑑於輸入：「」XMLParser的在權利要求菲羅U + 00A0是「無效UTF-8」

<?xml version='1.0' encoding='UTF-8' standalone='yes' ?> 
<sms body=". what" />

當字符之後的在短信標籤的身體屬性中是U+00A0;

我得到的錯誤：

XMLEncodingException: Invalid UTF-8 character encoding (line 2) (column 13)

IIUC，該字符的UTF-8表示爲0xC2 0xA0per Wikipedia。當然，輸入字節72和73分別是194和160。

這看起來像是XMLParser中的一個錯誤，或者我錯過了什麼？

來源

2016-07-28 Sean DeNigris

不能再現：'XMLDOMParser解析： '<？XML版本=' '1.0'」編碼= '' UTF-8'獨立=''yes''？> '' –

由於蒙蒂光臨救援on the Pharo User's list：

You're double decoding. Use onFileNamed:/parseFileNamed: instead (and the DOM printToFileNamed: family of messages when writing) and let XMLParser take care this for you, or disable XMLParser decoding before parsing with #decodesCharacters:.

Longer explanation:

The class #on:/#parse: take either a string or a stream (read the definitions). You gave it a FileReference, but because the argument is tested with isString and sent #readStream otherwise, it didn't blowup then.

File refs sent #readStream return file streams that do automatic decoding. But XMLParser automatically attempts its own decoding too, if:

The input starts with a BOM or it can be inferred by null bytes before or after the first non-null byte.

There is an encoding declaration with a non-UTF-8 encoding.

There is a UTF-8 encoding declaration but the stream is not a normal ReadStream (your case).

So it gets decoded twice, and the decoded value of the char causes the error. I'll consider changing the heuristic to make less eager to decode.

來源

2016-08-08 12:45:20

XMLParser的在權利要求菲羅U + 00A0是 「無效UTF-8」

回答

相關問題

XMLParser的在權利要求菲羅U + 00A0是「無效UTF-8」