Python熊貓羣體對象申請方法增加索引

我有這個問題是在閱讀"Python pandas groupby object apply method duplicates first group"後的擴展。Python熊貓羣體對象申請方法增加索引

我得到了答案，並試圖對我自己的一些實驗，例如：

import pandas as pd 
from cStringIO import StringIO 
s = '''c1 c2 c3 
1 2 3 
4 5 6''' 
df = pd.read_csv(StringIO(s), sep=' ') 
print df 
def f2(df): 
    print df.iloc[:] 
    print "--------" 
    return df.iloc[:] 
df2 = df.groupby(['c1']).apply(f2) 
print "======" 
print df2

給出預期：

c1 c2 c3 
0 1 2 3 
1 4 5 6 
    c1 c2 c3 
0 1 2 3 
-------- 
    c1 c2 c3 
0 1 2 3 
-------- 
    c1 c2 c3 
1 4 5 6 
-------- 
====== 
    c1 c2 c3 
0 1 2 3 
1 4 5 6

然而，當我嘗試只返回df.iloc [0 ]：

def f3(df): 
    print df.iloc[0:] 
    print "--------" 
    return df.iloc[0:] 
df3 = df.groupby(['c1']).apply(f3) 
print "======" 
print df3

，我得到一個額外的指標：

c1 c2 c3 
0 1 2 3 
-------- 
    c1 c2 c3 
0 1 2 3 
-------- 
    c1 c2 c3 
1 4 5 6 
-------- 
====== 
     c1 c2 c3 
c1    
1 0 1 2 3 
4 1 4 5 6

我做了一些搜索和懷疑這可能意味着有不同的代碼路徑？

來源

2015-11-05 ntg

區別在於iloc[:]返回對象本身，而iloc[0:]返回對象的視圖。看看這個：

>>> df.iloc[:] is df 
True 

>>> df.iloc[0:] is df 
False

如果這使得不同的是，GROUPBY內，每個組有一個name屬性反映的分組。當你的函數返回一個具有這個name屬性的對象時，沒有索引被添加到結果中，而如果你返回一個沒有這個name屬性的對象，就會添加一個索引來跟蹤每個來自哪個組。

有趣的是，你可以力的iloc[:]行爲iloc[0:]式設置組的name屬性返回前：

def f(x): 
    out = x.iloc[0:] 
    out.name = x.name 
    return out 

df.groupby('c1').apply(f) 
# c1 c2 c3 
# 0 1 2 3 
# 1 4 5 6

我的猜測是，與名爲輸出非索引行爲基本上是特殊情況意味着使df.groupby(col).apply(lambda x: x)成爲無操作。

來源

2015-11-05 14:16:53 jakevdp

似乎完全正確（也嘗試過= x.iloc [0：1]; out.name = x.name，並得到額外的索引）。另外，Scikit-Learn上的酷視頻，你搖滾:) – ntg

也試過= x.iloc [0：1]; out.name = x.name，並且獲得了額外的索引，但前提是返回的結果在存在重複的c1值時會有所不同。 – ntg

Python熊貓羣體對象申請方法增加索引

回答

相關問題