熊貓分配值從組的一個成員所有其他成員

我搜索了這一點，但還是不能讓我的頭輪組，所以......熊貓分配值從組的一個成員所有其他成員

的數據（dataFrame）是這個樣子（ *表示期望的輸出）：

 
    id parentid page_number is_critical_page page_number_of_critical* page_numbers_not_critical* 
    0 1 1 1 True 1 2,3,4,5 
    1 2 1 2 False 1 2,3,4,5 
    2 3 1 3 False 1 2,3,4,5 
    3 4 1 4 False 1 2,3,4,5 
    4 5 1 5 False 1 2,3,4,5 
    5 6 2 1 False 2 1,3 
    6 7 2 2 True 2 1,3 
    7 8 2 3 False 2 1,3 
    8 9 3 1 False -1 1 
    9 10 4 1 True 1 -1

欲：

組行乘parentid：
```
dgroups=dataFrame.groupby('parentid') 
```

應用任意操作，以基團：

def func(grp): 
    grp['has_critical_page'] = grp['is_critical_page'].sum()>0 # simple operation 
    ### Apply operation here to generate: 
    ### ?? grp['page_number_of_critical*'] = ... ?? # is a scalar 
    ### ?? grp['page_numbers_not_critical'] = ... ?? # is a list 
    return grp 

dgroups.apply(func) 

print dgroups.describe()

的-1的是N /正如 - 可能是一個NaN，None，-99或任何其它特殊值。

我不知道是否使用apply，transform，filter等，或是否申請（..）func到dataFrame或這些組的行。

試圖避免當然循環....謝謝！

PS積分爲如何在組內處理的is_critical_page多命中......

來源

2017-08-03 jtlz2

PPS不知道如何格式化數據表... – jtlz2

感謝誰固定的表... – jtlz2

其中一個方法是通過創建字典和映射，你可以將PAGE_NUMBER轉換爲字符串，然後加入他們的行列，而創建一個字典，然後映射字典即

df['page_number'] = df['page_number'].astype(str) 
critical_pages=df[df.is_critical_page] 
not_critical_pages=df[~df.is_critical_page] 

not_critical_pages = not_critical_pages.groupby('parentid')['page_number'].apply(','.join).to_dict() 
critical_pages = critical_pages.groupby('parentid')['page_number'].apply(','.join).to_dict() 

df['page_number_of_critical*'] = df['parentid'].map(critical_pages) 
df['not_page_number_of_critical*'] = df['parentid'].map(not_critical_pages)

輸出：

 
    id parentid page_number is_critical_page page_number_of_critical* \ 
0 1   1   1    True      1 
1 2   1   2    False      1 
2 3   1   3    False      1 
3 4   1   4    False      1 
4 5   1   5    False      1 
5 6   2   1    False      2 
6 7   2   2    True      2 
7 8   2   3    False      2 
8 9   3   1    False      NaN 
9 10   4   1    True      1 

    not_page_number_of_critical* 
0      2,3,4,5 
1      2,3,4,5 
2      2,3,4,5 
3      2,3,4,5 
4      2,3,4,5 
5       1,3 
6       1,3 
7       1,3 
8       1 
9       NaN

您可以使用fillna填寫您想要的值。

您還可以使用應用即

df['page_number'] = df['page_number'].astype(str) 

crn_pages = df.groupby(['parentid','is_critical_page'])['page_number'].apply(','.join).to_dict() 

df['page_number_of_critical*'] = df.apply(lambda x: crn_pages[x['parentid'],True] if (x['parentid'],True) in crn_pages else -1 ,axis=1) 
df['not_page_number_of_critical*'] = df.apply(lambda x: crn_pages[x['parentid'],False] if (x['parentid'],False) in crn_pages else -1 ,axis=1)

希望它可以幫助

來源

2017-08-03 07:53:09 Dark

對我的作品 - 非常感謝！ – jtlz2

很高興幫助@ jtlz2。 – Dark

熊貓分配值從組的一個成員所有其他成員

回答

相關問題