我不明白這個錯誤代碼。任何人都可以幫我嗎?Python Unicode錯誤信息
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 2:
ordinal not in range(128)
這是代碼:
import urllib2, os, zipfile
from lxml import etree
def xmlSplitter(data,separator=lambda x: x.startswith('<?xml')):
buff = []
for line in data:
if separator(line):
if buff:
yield ''.join(buff)
buff[:] = []
buff.append(line)
yield ''.join(buff)
def first(seq,default=None):
"""Return the first item from sequence, seq or the default(None) value"""
for item in seq:
return item
return default
datasrc = "http://commondatastorage.googleapis.com/patents/grantbib/2011/ipgb20110104_wk01.zip"
filename = datasrc.split('/')[-1]
if not os.path.exists(filename):
with open(filename,'wb') as file_write:
r = urllib2.urlopen(datasrc)
file_write.write(r.read())
zf = zipfile.ZipFile(filename)
xml_file = first([ x for x in zf.namelist() if x.endswith('.xml')])
assert xml_file is not None
count = 0
for item in xmlSplitter(zf.open(xml_file)):
count += 1
if count > 10: break
doc = etree.XML(item)
docID = first(doc.xpath('//publication-reference/document-id/doc-number/text()'))
title = first(doc.xpath('//invention-title/text()'))
lastName = first(doc.xpath('//addressbook/last-name/text()'))
firstName = first(doc.xpath('//addressbook/first-name/text()'))
street = first(doc.xpath('//addressbook/address/street/text()'))
city = first(doc.xpath('//addressbook/address/city/text()'))
state = first(doc.xpath('//addressbook/address/state/text()'))
postcode = first(doc.xpath('//addressbook/address/postcode/text()'))
country = first(doc.xpath('//addressbook/address/country/text()'))
print "DocID: {0}\nTitle: {1}\nLast Name: {2}\nFirst Name: {3}\nStreet: {4}\ncity: {5}\nstate: {6}\npostcode: {7}\ncountry: {8}\n".format(docID,title,lastName,firstName,street,city,state,postcode,country)
我得到的代碼的某個地方上網,我改變了它只是微小的,這是添加街道,城市,州,郵政編碼和國家。
XML文件大約包含200萬行代碼,您認爲這是原因嗎?
這意味着該ASCII只能處理低於128的字符值,而'u'\ xE4''是228,這是較大的。鑑於你的標籤,你解析一個XML文檔嗎?那麼你可以放棄在源代碼中放入'ä'。 – 2013-04-07 09:51:13
你的意思是我的XML的來源? – 2013-04-07 09:58:38
您需要顯示引發此錯誤的代碼。你是否保存文件,連接字符串,進行字符串比較,打印到控制檯等? – 2013-04-07 10:01:27