XML解析和Unicode（再次）

XML頭：XML解析和Unicode（再次）

<?xml version="1.0" encoding="UTF-8"?><points>

XML數據片段：

<point> 
<id>1781</id><lon>43.245766666667</lon><lat>56.636883333333</lat> 
<type>vert</type><last_update>2016-11-18 22:55:11</last_update> 
<active>1</active><verified>1</verified><international>0</international><name>Vеrshilovo</name><name_ru>Вершилово</name_ru><city/><belongs>АОН</belongs><inde

代碼：

tree = ET.parse(XMLFIL) 
root = tree.getroot() 
allpoints=root.findall('point') 
for point in allpoints: 
id=point.find('id').text 
name=point.find('name').text.encode('utf8') 
print name

這將獎勵我「AttributeError的：「NoneType '對象沒有屬性'編碼'「如果我忽略'編碼'我得到臭名昭着的''ascii'編解碼器不能編碼字符u'\ u0435'在位置1：序號不是我N檔（128）」

NB誤差與‘’的‘Vershilovo’E：它看起來確定這樣的，但XML數據的hexdump都給人

00000000 56 e5 72 73 68 69 6c 6f 76 6f 0a     |V.rshilovo.|

我發現了幾個相關的問題，但沒有人給我解決方案。根本原因可能是我的XML數據編碼不正確，但我無法控制它。我完全可以將不合法值重置爲默認值，例如「???」或者這樣。

來源

2017-08-13 Karel Adams

它看起來像一些項目沒有text屬性。你可以使用try-except塊或使用默認值，如果text爲None，如：

name = (point.find('name').text or '').encode('utf8')

另一個例子，使用if語句：

name = point.find('name').text 
if name: 
    name = name.encode('utf8')

來源

2017-08-13 08:52:08

我實在無法理解它是如何工作的 - 但它！現在輸出包括'不規範'的章程，整齊轉換：00000000 31 37 38 31 20 56 d0 b5 72 73 68 69 6c 6f 76 6f | 1781 V..rshilovo | –

我只是在'text'是'None'的情況下使用空字符串，所以python不會拋出異常。你也可以使用try-except或if-else，但我喜歡1-liners;） –

XML解析和Unicode（再次）

回答

相關問題