熊貓 - 類別變量和分組 - 這是一個錯誤？

我在熊貓玩耍的時候遇到了一個奇怪的結果，我不確定爲什麼會這樣。想知道它是否是一個錯誤。熊貓 - 類別變量和分組 - 這是一個錯誤？

cf = pd.DataFrame({'sc': ['b' , 'b', 'c' , 'd'], 'nn': [1, 2, 3, 4], 'mvl':[10, 20, 30, 40]}) 
df = cf.groupby('sc').mean() 
df.loc['b', 'mvl']

這給出了「15.0」的結果。

cf1 = cf 
cf1['sc'] = cf1['sc'].astype('category', categories=['b', 'c', 'd'], ordered = True) 
df1 = cf1.groupby('sc').mean() 
df1.loc['b','mvl']

這給作爲導致一個系列：

sc 

b 15.0 
Name: mvl, dtype: float64

type(df1.loc['b','mvl']) - >pandas.core.series.Series

type(df.loc['b','mvl']) - >numpy.float64

爲什麼會在聲明變量作爲分類變化的LOC的輸出從標量到系列？

我希望這不是一個愚蠢的問題。謝謝！

來源

2016-06-11 Luk17

運行版本'0.18.1'，我得到'n.0.0.0''numpy.float64'類型返回兩種情況。 – tmthydvnprt

我正在運行0.18.0，我以爲我正在運行最新版本。非常感謝您使用0.18.1進行檢查，我會更新並刪除我必須添加的醜陋修復以使行爲保持一致。 – Luk17

這可能是一個熊貓的錯誤。不同之處在於，當您對分類變量進行分組時，您會得到一個分類索引。您可以更簡單地看到它沒有任何GROUPBY：

nocat = pandas.Series(['a', 'b', 'c']) 
cat = nocat.astype('category', categories=['a', 'b', 'c'], ordered=True) 
xno = pandas.Series([8, 88, 888], index=nocat) 
xcat = pandas.Series([8, 88, 888], index=cat) 

>>> xno.loc['a'] 
8 
>>> xcat.loc['a'] 
a 8 
dtype: int64

的docs請注意，在CategoricalIndex索引操作保存分類指數。如果你只得到一個結果，他們甚至會這樣做，這並不完全違背文檔，但似乎是不受歡迎的行爲。

有a related pull request似乎解決此問題，但它最近才被合併。看起來修復應該是熊貓0.18.1。

來源

2016-06-11 22:20:32 BrenBarn

該錯誤在0.18.1版中修復，如PR – Jeff

所示。它可能與文檔中的文檔矛盾，這意味着在使用'loc'進行分片時應該返回一個系列。 http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics – Alexander

熊貓 - 類別變量和分組 - 這是一個錯誤？

回答

相關問題