2016-10-31 32 views
2

這是我的數據幀 例如:重複的嵌套列表從Python的熊貓數據幀

    requesttime checkinperiod 

0 2016-10-16T14:53:58.000Z    8 

1 2016-10-16T22:53:22.000Z    8 

2 2016-10-18T14:52:22.000Z    8 

3 2016-10-18T06:53:08.000Z    8 

4 2016-10-16T06:53:37.000Z    8 

5 2016-10-15T22:53:14.000Z    8 

6 2016-10-19T22:51:51.000Z    8 

7 2016-10-22T10:16:57.000Z    12 

8 2016-10-20T10:54:37.000Z    12 

9 2016-10-20T06:51:42.000Z    12 

10 2016-10-10T22:44:17.000Z    24 

11 2016-10-13T22:47:26.000Z    8 

12 2016-10-14T14:53:27.000Z    8 

13 2016-10-14T22:53:58.000Z    8 

14 2016-10-15T06:53:28.000Z    8 

15 2016-10-14T06:53:58.000Z    8 

16 2016-10-10T16:38:28.000Z    24 

17 2016-10-17T06:53:50.000Z    8 

18 2016-10-17T14:53:12.000Z    8 

19 2016-10-19T14:51:53.000Z    8 

20 2016-10-17T22:53:44.000Z    8 

21 2016-10-15T14:53:50.000Z    8 

22 2016-10-18T22:52:39.000Z    8 

23 2016-10-12T22:27:51.000Z    24 

24 2016-10-11T23:05:57.000Z    24 

25 2016-10-19T06:52:53.000Z    8 

26 2016-10-21T10:09:09.000Z    12 

27 2016-10-21T22:17:15.000Z    12 

28 2016-10-22T22:16:53.000Z    12 

29 2016-10-20T23:02:13.000Z    12 

所需的輸出:

{ 

8 : [ 
     [2016-10-16T14:53:58.000Z, 2016-10-16T22:53:22.000Z, 2016-10-18T14:52:22.000Z, 2016-10-16T06:53:37.000Z, 2016-10-15T22:53:14.000Z, 2016-10-19T22:51:51.000Z], 
     [2016-10-13T22:47:26.000Z, 2016-10-13T22:47:26.000Z, 2016-10-14T22:53:58.000Z, 2016-10-15T06:53:28.000Z, 2016-10-14T06:53:58.000Z], 
     [2016-10-17T06:53:50.000Z, 2016-10-17T14:53:12.000Z, 2016-10-19T14:51:53.000Z, 2016-10-17T22:53:44.000Z, 2016-10-15T14:53:50.000Z, 2016-10-18T22:52:39.000Z], 
     [2016-10-19T06:52:53.000Z] 
], 
12: [ 
     [2016-10-22T10:16:57.000Z, 2016-10-20T10:54:37.000Z, 2016-10-20T06:51:42.000Z], 
     [2016-10-21T10:09:09.000Z, 2016-10-21T22:17:15.000Z, 2016-10-22T22:16:53.000Z, 2016-10-20T23:02:13.000Z] 
], 
24: [ 
     [2016-10-10T22:44:17.000Z], 
     [2016-10-10T16:38:28.000Z], 
     [2016-10-12T22:27:51.000Z, 2016-10-11T23:05:57.000Z] 
] 
} 

由於 薩米特

+0

嗯,什麼?也許閱讀這個:http://stackoverflow.com/help/mcve? – MooingRawr

回答

0
import pandas as pd 

# make sample data 
col = 'checkinperiod' 
df = pd.DataFrame([['a', 8], ['b', 8], ['c', 8],['c', 12], ['d', 8], ['e', 12], ['f', 12]], 
        columns=['requesttime', col]) 
print df 

    requesttime checkinperiod 
0   a    8 
1   b    8 
2   c    8 
3   c    12 
4   d    8 
5   e    12 
6   f    12 

# shift the dataframe one row down and compare with previous row 
df['group'] = (df[col].shift(1) != df[col]).astype(int).cumsum() 
print df 

    requesttime checkinperiod group 
0   a    8  1 
1   b    8  1 
2   c    8  1 
3   c    12  2 
4   d    8  3 
5   e    12  4 
6   f    12  4 

# group by those groups and combine the results 
df_grouped = pd.DataFrame(df.groupby([col, 'group']).apply(
    lambda df: list(df['requesttime']))) 
df_grouped = df_grouped.reset_index().drop('group', axis=1) 
print df_grouped 

    checkinperiod   0 
0    8 [a, b, c] 
1    8  [d] 
2    12  [c] 
3    12  [e, f] 

result = df_grouped.groupby(col).apply(lambda df: list(df[0])).to_dict() 
print result 

{8: [['a', 'b', 'c'], ['d']], 12: [['c'], ['e', 'f']]} 

受啓發[1]

+0

我嘗試了與我的數據相同的代碼,但沒有給出正確的結果: –

+0

嘗試了此df = pd.DataFrame([['2016-10-16T14:53:58.000Z',8],['2016-10-16T22: 53:22.000Z',8],['2016-10-18T14:52:22.000Z',8],['2016-10-18T06:53:08.000Z',8],['2016-10-16T06 :53:37.000Z',8],['2016-10-15T22:53:14.000Z',8],['2016-10-19T22:51:51.000Z',8],['2016-10- 22T10:16:57.000Z',12],['2016-10-20T10:54:37.000Z',12],['2016-10-20T06:51:42.000Z',12],['2016-10 -10T22:44:17.000Z',24],['2016-10-13T22:47:26.000Z',8],['2016-10-14T14:53:27.000Z',8],['2016- 10-14T22:53:58.000Z',8],['2016-10-15T06:53:28.000Z',8],['2016-10-14T06:53:58.000Z',8]], ['requesttime',col]) –

+0

看起來如果數值超過了2,那麼它又會再創建一個列表: 試試這個: df = pd.DataFrame([['a',8],['b' ,['c',8],['c',12],['d',8],['e',12],['f',12]],columns = ['requesttime ',col] ) –