2013-04-23 58 views
1

如何查找使用python熊貓分組的幾列中元素的計數?使用python熊貓,找到列中按幾列分組的元素的計數

我有以下csv文件結構:

'Country'  'City'  'Year' 'Month' 'Value' 'Street_Code' 
USA   New York 1971  jan  0.0  1 
USA   New York 1971  feb  23.5  1 
USA   New York 1971  mar  10.2  1 
USA   Florida  1971  jan  0.0  1 
USA   Florida  1971  feb  0.0  1 
USA   Florida  1971  mar  0.0  1 
USA   New York 1971  jan  0.0  2 
USA   New York 1971  feb  15.0  2 
USA   New York 1971  mar  7.6  2 
USA   Florida  1971  jan  0.0  2 
USA   Florida  1971  feb  0.0  2 
USA   Florida  1971  mar  2.3  2 

欲通過分組'Country''City''Year' & 'Street Code'計數在'value'零(0.0)的數量。

我已經嘗試過目前爲止;

import pandas as pd 
data = pd.read_csv('country_details.csv') 
count_data = data[data['Value'] == 0.0] # I'm filtering the data. I don't think this is the right way of doing it 
grouped = count_data.groupby(['Country','Year','Month','Street_Code']) # I'm stuck here 
+0

並您的數據得到正確地解析read_csv? – elyase 2013-04-23 14:16:38

+0

是的。沒有問題。 – richie 2013-04-23 14:17:40

+3

如果你的數據被'read_csv'正確解析,爲什麼你有一個名爲''0''的列?我以爲它會是'data [data ['Value'] == 0]'。 – DSM 2013-04-23 14:19:13

回答

2

您的過濾幾乎是正確的,但您需要引用列名稱,在這種情況下,'值'。

嘗試:

import pandas as pd 
import StringIO 

csv = StringIO.StringIO("""Country,City,Year,Month,Value,Street_Code 
USA,NewYork,1971,jan,0.0,1 
USA,NewYork,1971,feb,23.5,1 
USA,NewYork,1971,mar,10.2,1 
USA,Florida,1971,jan,0.0,1 
USA,Florida,1971,feb,0.0,1 
USA,Florida,1971,mar,0.0,1 
USA,NewYork,1971,jan,0.0,2 
USA,NewYork,1971,feb,15.0,2 
USA,NewYork,1971,mar,7.6,2 
USA,Florida,1971,jan,0.0,2 
USA,Florida,1971,feb,0.0,2 
USA,Florida,1971,mar,2.3,2""") 

data = pd.read_csv(csv) 

datasub = data[data['Value'] == 0.0] 

print datasub.groupby(['Country','Year','Month','Street_Code'])['Value'].count() 

Country Year Month Street_Code 
USA  1971 feb 1    1 
         2    1 
       jan 1    2 
         2    2 
       mar 1    1