Python Unicode錯誤信息

我不明白這個錯誤代碼。任何人都可以幫我嗎？Python Unicode錯誤信息

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 2: 
ordinal not in range(128)

這是代碼：

import urllib2, os, zipfile 
from lxml import etree 

def xmlSplitter(data,separator=lambda x: x.startswith('<?xml')): 
    buff = [] 
    for line in data: 
    if separator(line): 
     if buff: 
     yield ''.join(buff) 
     buff[:] = [] 
    buff.append(line) 
    yield ''.join(buff) 

def first(seq,default=None): 
    """Return the first item from sequence, seq or the default(None) value""" 
    for item in seq: 
    return item 
    return default 

datasrc = "http://commondatastorage.googleapis.com/patents/grantbib/2011/ipgb20110104_wk01.zip" 
filename = datasrc.split('/')[-1] 

if not os.path.exists(filename): 
    with open(filename,'wb') as file_write: 
    r = urllib2.urlopen(datasrc) 
    file_write.write(r.read()) 

zf = zipfile.ZipFile(filename) 
xml_file = first([ x for x in zf.namelist() if x.endswith('.xml')]) 
assert xml_file is not None 

count = 0 
for item in xmlSplitter(zf.open(xml_file)): 
    count += 1 
    if count > 10: break 
    doc = etree.XML(item) 
    docID = first(doc.xpath('//publication-reference/document-id/doc-number/text()')) 
    title = first(doc.xpath('//invention-title/text()')) 
    lastName = first(doc.xpath('//addressbook/last-name/text()')) 
    firstName = first(doc.xpath('//addressbook/first-name/text()')) 
    street = first(doc.xpath('//addressbook/address/street/text()')) 
    city = first(doc.xpath('//addressbook/address/city/text()')) 
    state = first(doc.xpath('//addressbook/address/state/text()')) 
    postcode = first(doc.xpath('//addressbook/address/postcode/text()')) 
    country = first(doc.xpath('//addressbook/address/country/text()')) 
    print "DocID: {0}\nTitle: {1}\nLast Name: {2}\nFirst Name: {3}\nStreet: {4}\ncity: {5}\nstate: {6}\npostcode: {7}\ncountry: {8}\n".format(docID,title,lastName,firstName,street,city,state,postcode,country)

我得到的代碼的某個地方上網，我改變了它只是微小的，這是添加街道，城市，州，郵政編碼和國家。

XML文件大約包含200萬行代碼，您認爲這是原因嗎？

來源

2013-04-07 Gold Skull with Pattern

這意味着該ASCII只能處理低於128的字符值，而'u'\ xE4''是228，這是較大的。鑑於你的標籤，你解析一個XML文檔嗎？那麼你可以放棄在源代碼中放入'ä'。 – 2013-04-07 09:51:13

你的意思是我的XML的來源？ – 2013-04-07 09:58:38

您需要顯示引發此錯誤的代碼。你是否保存文件，連接字符串，進行字符串比較，打印到控制檯等？ – 2013-04-07 10:01:27

您正在解析XML，並且該庫已知道如何處理您的解碼。該API將返回unicode對象，但您試圖將它們視爲字節字符串。

當你打電話''.format()，您使用的是蟒蛇，而不是字節串一個unicode的對象，所以Python有編碼的Unicode值，以適應在一個字節串。爲此，它只能使用默認值，即ASCII。

簡單的解決方法是使用一個unicode字符串有代替，注意u''字符串文字：

print u"DocID: {0}\nTitle: {1}\nLast Name: {2}\nFirst Name: {3}\nStreet: {4}\ncity: {5}\nstate: {6}\npostcode: {7}\ncountry: {8}\n".format(docID,title,lastName,firstName,street,city,state,postcode,country)

Python將仍然打印時編碼這一點，但至少現在Python可以做一些自動檢測的終端，並確定它需要使用什麼編碼。

您可能需要Python和Unicode的讀了起來：由Joel Spolsky的由斯內德爾德

來源

2013-04-07 10:30:20

它的工作非常感謝！ – 2013-04-07 10:42:34

ASCII characters範圍從0（\ x00）到127（\ x7F）。你的角色（\ xE4 = 228）大於最高可能值。因此，您必須更改編解碼器（例如UTF-8）才能編碼此值。

來源

2013-04-07 09:51:18 pascalhein

那麼我該如何更改編解碼器？我是這樣一個新手，請幫助 – 2013-04-07 09:58:59

@EdwardOctavianusPakpahan取決於您當前的代碼。如果你有'u'\ xe4'.encode（'ascii'）'，只需將'ascii'改爲'utf-8'。 – pascalhein 2013-04-07 10:02:03

Im解析一個XML文件，'<？xml version =「1.0」encoding =「UTF-8」？>'我認爲它已經在UTF-8中了？ – 2013-04-07 10:23:06

沒有明文這樣的東西。文本始終是一種編碼，這是您用一系列字節表示給定符號（字母，逗號，日文漢字）的方式。符號「代碼」與字節之間的映射稱爲編碼。

在Python 2.7中，編碼文本（str）和通用未編碼文本（unicode（））之間的區別最多是令人困惑的。 python 3放棄了整個事情，並且默認情況下總是使用unicode類型。

在任何情況下，發生的情況是您正在嘗試讀取一些文本並將其放入字符串中，但是此文本包含的內容不能被強制轉換爲ASCII編碼。 ASCII只能理解0-127範圍內的字符，這是標準字符集（字母，數字，用於編程的符號）。 ASCII的一個可能的擴展是拉丁-1（也稱爲iso-8859-1），其中範圍128-255映射到拉丁字符，如重音a。這種編碼的優點是你仍然可以得到一個字節==一個字符。 UTF-8是ASCII的另一個擴展，您可以釋放約束一個字節==一個字符，並允許一個字符用一個字節表示，一些用兩個字符表示，依此類推。

要解決您的問題，這取決於。這取決於問題出在哪裏。我猜你正在解析一個文本文件，該文件以某種你不知道的編碼進行編碼，我猜可能是latin-1或UTF-8。如果你這樣做，你必須在open（）中打開指定encoding ='utf-8'的文件，但這取決於它。很難從你提供的內容中說出。

來源

2013-04-07 10:03:13

+1很好的解釋。 – pascalhein 2013-04-07 10:04:31

我正在解析一個XML文件，在XML文件中的代碼的頂部，它說'<？xml version =「1.0」encoding =「UTF-8」？>'我假設XML文件已經在UTF-7編碼風格？所以如果我需要改變我的代碼，那麼最好放在哪裏？ – 2013-04-07 10:24:33

不，不是UTF-7。 UTF-8，這是不同的！無論如何，是的，xml是這樣編碼的，它確實包含非ascii字符，所以您將需要適當的編解碼器。 – 2013-04-07 10:31:01

Python Unicode錯誤信息

回答

相關問題