我試圖導入CSV文件,以培養我的分類,但我不斷收到此錯誤Unicode的錯誤ASCII不能編碼字符
traceback (most recent call last):
File "updateClassif.py", line 17, in <module>
myClassif = NaiveBayesClassifier(fp, format="csv")
File "C:\Python27\lib\site-packages\textblob\classifiers.py", line 191, in __init__
super(NLTKClassifier, self).__init__(train_set, feature_extractor, format, **kwargs)
File "C:\Python27\lib\site-packages\textblob\classifiers.py", line 123, in __init__
self.train_set = self._read_data(train_set, format)
File "C:\Python27\lib\site-packages\textblob\classifiers.py", line 143, in _read_data
return format_class(dataset, **self.format_kwargs).to_iterable()
File "C:\Python27\lib\site-packages\textblob\formats.py", line 68, in __init__
self.data = [row for row in reader]
File "C:\Python27\lib\site-packages\textblob\unicodecsv\__init__.py", line 106, in next
row = self.reader.next()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe6' in position 55: ordinal not in range(128)
CSV文件包含160萬行的鳴叫所以我相信一些推文包含特殊字符。我嘗試使用開放式辦公室保存它作爲某人推薦,但仍然是相同的結果。我也嘗試使用拉丁編碼,但結果相同。 這是我的代碼:
with codecs.open('tr.csv', 'r' ,encoding='latin-1') as fp:
myClassif = NaiveBayesClassifier(fp, format="csv")
這是從庫中我使用的代碼:
def __init__(self, csvfile, fieldnames=None, restkey=None, restval=None,
dialect='excel', encoding='utf-8', errors='strict', *args,
**kwds):
if fieldnames is not None:
fieldnames = _stringify_list(fieldnames, encoding)
csv.DictReader.__init__(self, csvfile, fieldnames, restkey, restval, dialect, *args, **kwds)
self.reader = UnicodeReader(csvfile, dialect, encoding=encoding,
errors=errors, *args, **kwds)
if fieldnames is None and not hasattr(csv.DictReader, 'fieldnames'):
# Python 2.5 fieldnames workaround. (http://bugs.python.org/issue3436)
reader = UnicodeReader(csvfile, dialect, encoding=encoding, *args, **kwds)
self.fieldnames = _stringify_list(reader.next(), reader.encoding)
self.unicode_fieldnames = [_unicodify(f, encoding) for f in
self.fieldnames]
self.unicode_restkey = _unicodify(restkey, encoding)
def next(self):
row = csv.DictReader.next(self)
result = dict((uni_key, row[str_key]) for (str_key, uni_key) in
izip(self.fieldnames, self.unicode_fieldnames))
rest = row.get(self.restkey)
請發佈追溯的**全文**。另外,請指出您使用的Python版本。 – MattDMo
它可能是utf = 8編碼的。試試看。 – tdelaney
@tdelaney我嘗試過使用utf = 8並且它返回給我這個:「UnicodeDecodeError:'utf8'編解碼器無法解碼位置35中的字節0xe6:無效繼續字節」 – Pca