嘗試使用python讀取CSV文件時,遇到了路障。使用python讀取CSV文件時的編碼問題
UPDATE: 如果你只想跳過字符或錯誤,您可以打開該文件是這樣的:
with open(os.path.join(directory, file), 'r', encoding="utf-8", errors="ignore") as data_file:
到目前爲止,我都試過了。
for directory, subdirectories, files in os.walk(root_dir):
for file in files:
with open(os.path.join(directory, file), 'r') as data_file:
reader = csv.reader(data_file)
for row in reader:
print (row)
我得到的錯誤是:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to <undefined>
我已經試過
with open(os.path.join(directory, file), 'r', encoding="UTF-8") as data_file:
錯誤:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2026' in position 223: character maps to <undefined>
現在,如果我只是打印DATA_FILE它說,他們是cp1252編碼,但如果我嘗試
with open(os.path.join(directory, file), 'r', encoding="cp1252") as data_file:
我得到的錯誤是:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to <undefined>
我也試過推薦的包。
我得到的錯誤是:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 224-225: character maps to <undefined>
我試圖解析該生產線是:
2015-11-28 22:23:58,670805374291832832,479174464,"MarkCrawford15","RT @WhatTheFFacts: The tallest man in the world was Robert Pershing Wadlow of Alton, Illinois. He was slighty over 8 feet 11 inches tall.","None
任何想法或幫助表示讚賞。
CP1252,根據谷歌,是一個視窗字符編碼disscussed。你的環境是什麼,文件來自哪裏?例如,如果你用nano打開csv文件,它是否說它是dos格式? – Ogaday
我不明白你在nano中打開文件的意思......我在一臺Windows機器上。 – user3271518
噢,好的。我以爲你可能在Unix上 - 我以前在Linux上解析DOS格式的文件時遇到了麻煩,並認爲它可能是一個類似的問題。 Nano是Linux系統中常見的終端文本編輯器。 – Ogaday