將CSV文件讀入Pandas Dataframe中，並顯示無效字符（重音符號）

我想將csv文件讀入熊貓數據框。但是，csv包含重音符號。我正在使用Python 2.7將CSV文件讀入Pandas Dataframe中，並顯示無效字符（重音符號）

我遇到了UnicodeDecodeError，因爲第一列有重音。我讀過一堆網站，如this SO question about UTF-8 in CSV files，this blog post on CSV errors related to newlines和this blog post on UTF-8 issues in Python 2.7。

我用我從那裏找到的答案嘗試修改我的代碼。原來我有：

import pandas as pd 

#Create a dataframe with the data we are interested in 
df = pd.DataFrame.from_csv('MYDATA.csv') 
mode = lambda ts: ts.value_counts(sort=True).index[0] 
cols = df['CompanyName'].value_counts().index 
df['Calls'] = df.groupby('CompanyName')['CompanyName'].transform(pd.Series.value_counts)

Excetera。它的工作，但現在傳遞「你」和「東北」作爲一個客戶的名字是給錯誤：

UnicodeDecodeError: 'utf8' codec can't decode byte 0xea in position 7: invalid continuation byte

我試圖改變行 DF = pd.read_csv（「MYDATA.csv」，編碼='utf-8'）但是這給出了同樣的錯誤。

所以我試圖從我研究發現的建議，但它也不工作，並且我得到相同的錯誤。

import pandas as pd 
import csv 

def unicode_csv_reader(utf8_data, dialect=csv.excel, **kwargs): 
    csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs) 
    for row in csv_reader: 
     yield [unicode(cell, 'utf-8') for cell in row] 


reader = unicode_csv_reader(open('MYDATA.csv','rU'), dialect = csv.reader) 
#Create a dataframe with the data we are interested in 
df =pd.DataFrame(reader)

我覺得將csv數據讀入熊貓數據框不應該這麼困難。有誰知道更簡單的方法？

編輯：什麼是真正奇怪的是，如果我用重音字符刪除行我仍然得到錯誤

UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 960: invalid continuation byte.

這是奇怪，因爲我的測試CSV有19行27列。但我希望如果我爲整個csv解碼utf8，它將解決這個問題。

來源

2015-06-19 jenryb

請不要使用'from_csv'它不再更新使用頂層'read_csv'請試試這個：'df = pd.read_csv（MYDATA.csv'，encoding ='utf-8'）' – EdChum

是的，我也試過這個，但是我收到錯誤「AttributeError：type object'DataFrame'沒有屬性'read_csv'」如果我的行是：df = pd.DataFrame.read_csv（'testing2.csv'，encoding =' utf-8'），否則如果有兩行ra = pd.read_csv（'testing2.csv'，encoding ='utf-8'），則得到相同的UnicodeDecodeError // df = Dataframe（ra） – jenryb

那麼錯誤是如果你仔細閱讀我的代碼，它會顯示'pd.read_csv'so'import pandas as pd df = pd.read_csv（MYDATA.csv'，encoding ='utf- 8'）' – EdChum

嘗試增加給你的腳本的頂部：

import sys 
reload(sys) 
sys.setdefaultencoding('utf8')

來源

2015-06-19 19:25:00 GNMO11

感謝您的輸入！但是，我收到了同樣的錯誤。 – jenryb

-1

我知道這是很煩人的，當我們在read_csv滿足錯誤。你可以試試這個df = pd.read_csv（filename，sep =''，error_bad_lines = False）。它可以跳過壞的路線，它可以節省很多時間。

來源

2016-04-14 01:11:21

將CSV文件讀入Pandas Dataframe中，並顯示無效字符（重音符號）

回答

相關問題