2017-05-05 53 views
1

這裏充滿價值的NAS原題: Group by min and fill NAs with value from another column集團通過分和另一列第2部分

我有這樣的數據幀:

mydf = pd.DataFrame (data = {'uid': [1,1,1,2,2,3,4,4,4,4], 'pagename': 
['home', 'blah', 
'blah', 'home', 'blah', 'blah','blah','home','blah','blah'], 'startpage': 
[np.nan, np.nan, np.nan, 'home', 
'home', 'blah',np.nan,np.nan,np.nan,np.nan], 'date_time': 
[0,1,2,5,9,1,1,2,3,4], 'page_event': [0,0,0,0,0,0,10,0,0,10]}) 

我想這個數據幀:

endingdf = pd.DataFrame (data = {'uid': [1,1,1,2,2,3,4,4,4,4], 'pagename': 
['home', 'blah', 'blah', 'home', 'blah','blah','blah','home','blah','blah'], 
'startpage': [np.nan, np.nan, np.nan, 'home', 
'home','blah',np.nan,np.nan,np.nan,np.nan], 
'date_time': [0,1,2,5,9,1,1,2,3,4], 'page_event': [0,0,0,0,0,0,10,0,0,10], 
'new_start_page':['home', 'home', 'home', 'home', 'home', 'blah', 'home', 
'home', 'home', 'home']}) 

我想要做的是按UID分組,如果startpageNULL,則使用fir st pagename的訪問(min_ date_time)但只有當page_event = 0。所以如果第一個pagenamepage_event = 10那就跳過那個,直到page_event = 0

回答

1
e = mydf.page_event 
p = mydf.pagename 
s = mydf.startpage 
u = mydf.uid 
m = e.mask(e == 10).groupby(u).apply(pd.Series.first_valid_index) 

s.fillna(u.map(m).map(p), inplace=True) 

print(mydf) 

    date_time page_event pagename startpage uid 
0   0   0  home  home 1 
1   1   0  blah  home 1 
2   2   0  blah  home 1 
3   5   0  home  home 2 
4   9   0  blah  home 2 
5   1   0  blah  blah 3 
6   1   10  blah  home 4 
7   2   0  home  home 4 
8   3   0  blah  home 4 
9   4   10  blah  home 4