2017-04-07 52 views
0

我有以下一組數據。熊貓 - 列未讀取目前

url, team1, team2, win_toss, bat_or_bowl, outcome, win_game, date,day_n_night, ground, rain, duckworth_lewis, match_id, type_of_match 
"espncricinfo-t20/145227.html","Western Australia","Victoria","Victoria","bat","Western Australia won by 8 wickets (with 47 balls remaining)","Western Australia"," Jan 12 2005","1"," Western Australia Cricket Association Ground,Perth","0","0","145227","T20" 
"espncricinfo-t20/212961.html","Australian Institute of Sports","New Zealand Academy","New Zealand Academy","bowl","Match tied",""," Jul 7 2005 ","0"," Albury Oval, Brisbane","0","0","212961","T20" 
"espncricinfo-t20/216598.html","Air India","New South Wales","Air India","bowl","Air India won by 7 wickets (with 5 balls remaining)","Air India"," Aug 19 2005 ","0"," M Chinnaswamy Stadium, Bangalore","0","0","216598","T20" 
"espncricinfo-t20/216620.html","Karnataka State Cricket Association XI","Bradman XI","Bradman XI","bowl","Karnataka State Cricket Association XI won by 33 runs","Karnataka State Cricket Association XI"," Aug 20 2005 ","0"," M Chinnaswamy Stadium, Bangalore","0","0","216620","T20" 
"espncricinfo-t20/216633.html","Chemplast","Bradman XI","Chemplast","bat","Bradman XI won by 6 wickets (with 13 balls remaining)","Bradman XI"," Aug 20 2005 ","0"," M Chinnaswamy Stadium, Bangalore","0","0","216633","T20" 

這是蟒蛇控制檯:

**

>>> import pandas as pd 
>>> df = pd.read_csv("sample.txt" , quotechar = '\"') 
>>> df.shape 
(9, 14) 


>>> df.columns 
Index([u'url', u' team1', u' team2', u' win_toss', u' bat_or_bowl', 
     u' outcome', u' win_game', u' date', u' day_n_night', u' ground', 
     u' rain', u' duckworth_lewis', u' match_id', u' type_of_match'], 
     dtype='object') 


>>> df.url.head() 
0 espncricinfo-t20/145227.html 
1 espncricinfo-t20/212961.html 
2 espncricinfo-t20/216598.html 
3 espncricinfo-t20/216620.html 
4 espncricinfo-t20/216633.html 
Name: url, dtype: object 


>>> df.team1.head() 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "/usr/local/python27/lib/python2.7/site-packages/pandas/core/generic.py", line 2744, in __getattr__ 
    return object.__getattribute__(self, name) 
AttributeError: 'DataFrame' object has no attribute 'team1' 



>>> df.iloc[1:2] 
          url       team1 \ 
1 espncricinfo-t20/212961.html Australian Institute of Sports 

       team2    win_toss bat_or_bowl  outcome \ 
1 New Zealand Academy New Zealand Academy   bowl Match tied 

    win_game   date day_n_night     ground rain \ 
1  NaN Jul 7 2005    0 Albury Oval, Brisbane  0 

    duckworth_lewis match_id type_of_match 
1     0  212961   T20 

我們可以看到列TEAM1存在,但我無法從Df的檢索。除第一個列外,所有列都出現此錯誤。任何人都可以幫助我在這裏找到問題!由於

回答

0

你有一個前導空格:

u' team1' 

在列,因此它提出KeyError

做這個:

pd.read_csv("sample.txt" , quotechar = '\"', skipinitialspace=True) 

所以CSV閱讀並忽略前導空格

看到docs

+0

謝謝EdChum。這看起來更加優雅。 – ANI

1

有列名的空格,需要通過strip其刪除:

df.columns = df.columns.str.strip() 
+0

,我會嘗試。謝謝。但爲什麼df.shape顯示確切的列數? – ANI

+0

列是項目的數組,所以長度是確定的。唯一的問題是項目 - 一些包含空格 - 'u'team1','team2''但需要'u'team1',u'team2',' – jezrael

+0

謝謝+1,一種新的處理方式。我會和EdChum一起回答,因爲它看起來更優雅。感謝您的迴應! – ANI