以連續值組切片熊貓數據幀

我有一個數據框，其中包含最終「跳過」（即增加超過1）的連續值部分。我想數據框，類似拆分groupby功能（字母索引作秀）：以連續值組切片熊貓數據幀

A 
a 1 
b 2 
c 3 
d 6 
e 7 
f 8 
g 11 
h 12 
i 13 

# would return 

a 1 
b 2 
c 3 
----- 
d 6 
e 7 
f 8 
----- 
g 11 
h 12 
i 13

來源

2014-09-30 heltonbiker

速度答案稍有改善......

for k,g in df.groupby(df['A'] - np.arange(df.shape[0])): 
    print g

來源

2014-09-30 14:58:03 ZJS

非常非常聰明的...謝謝 – heltonbiker 2014-09-30 16:12:41

我們可以使用shift來比較，如果行之間的差異大於1，然後構造元組對的列表所需要的指標：

In [128]: 
# list comprehension of the indices where the value difference is larger than 1, have to add the first row index also 
index_list = [df.iloc[0].name] + list(df[(df.value - df.value.shift()) > 1].index) 
index_list 
Out[128]: 
['a', 'd', 'g']

我們要構建一個我們感興趣的範圍內的元組對的列表，請注意，在大熊貓包括在BEG和結束索引值，所以我們必須找到標籤前一行爲結束範圍標籤：

In [170]: 

final_range=[] 
for i in range(len(index_list)): 
    # handle last range value 
    if i == len(index_list) -1: 
     final_range.append((index_list[i], df.iloc[-1].name)) 
    else: 
     final_range.append((index_list[i], df.iloc[ np.searchsorted(df.index, df.loc[index_list[i + 1]].name) -1].name)) 

final_range 

Out[170]: 
[('a', 'c'), ('d', 'f'), ('g', 'i')]

我使用numpy的真實searchsorted找到索引值（整數爲主），我們可以從這個插入我們的價值，然後減去1獲得以前行的索引標籤值

In [171]: 
# now print 
for r in final_range: 
    print(df[r[0]:r[1]]) 
     value 
index  
a   1 
b   2 
c   3 
     value 
index  
d   6 
e   7 
f   8 
     value 
index  
g   11 
h   12 
i   13

來源

2014-09-30 13:21:09 EdChum

我的兩分錢只是它的樂趣。

In [15]: 

for grp, val in df.groupby((df.diff()-1).fillna(0).cumsum().A): 
    print val 
    A 
a 1 
b 2 
c 3 
    A 
d 6 
e 7 
f 8 
    A 
g 11 
h 12 
i 13

來源

2014-09-30 14:52:45

以連續值組切片熊貓數據幀

回答

相關問題