Python中提取新的數據幀

-3

topic student level 
    1  a  1  
    1  b  2  
    1  a  3  
    2  a  1  
    2  b  2  
    2  a  3  
    2  b  4  
    3  c  1  
    3  b  2  
    3  c  3  
    3  a  4  
    3  b  5

它包含了列級，指定誰開始的話題，誰回答了它。如果一個級別是1，這意味着一個學生開始了這個話題。如果級別爲2，則表示學生回答了開始該主題的學生。如果一個級別是3，這意味着一名學生回答2級的學生，等等。

我想提取一個新的數據框，應該通過主題介紹學生之間的溝通。它應該包含三欄：「學生來源」，「學生目的地」和「回覆計數」。回覆次數是學生目的地「直接」回覆學生來源的次數。

我應該得到的東西，如：

st_source st_dest reply_count 
     a  b  4 
     a  c  0 
     b  a  2 
     b  c  1 
     c  a  1 
     c  b  1

我試圖找到使用此代碼前兩列..

idx_cols = ['topic'] 
std_cols = ['student_x', 'student_y'] 
df1 = df.merge(df, on=idx_cols) 
df2 = df1.loc[f1.student_x != f1.student_y, idx_cols + std_cols] 

df2.loc[:, std_cols] = np.sort(df2.loc[:, std_cols])

沒有人有第三列一些建議嗎？

預先感謝您！

來源

2017-05-06 Sheron

你試過了什麼？ – blackmamba

@blackmamba現在檢查它.. – Sheron

假設您的數據已按主題，學生和級別排序。如果不是，請先排序。

#generate the reply_count for each valid combination by comparing the current row and the row above. 
count_list = df.apply(lambda x: [df.ix[x.name-1].student if x.name >0 else np.nan, x.student, x.level>1], axis=1).values 

#create a count dataframe using the count_list data 
df_count = pd.DataFrame(columns=['st_source','st_dest','reply_count'], data=count_list) 

#Aggregate and sum all counts belonging to a source-dest pair, finally remove rows with same source and dest. 
df_count = df_count.groupby(['st_source','st_dest']).sum().astype(int).reset_index()[lambda x: x.st_source != x.st_dest] 

print(df_count) 
Out[218]: 
    st_source st_dest reply_count 
1   a  b   4 
2   b  a   2 
3   b  c   1 
4   c  a   1 
5   c  b   1

來源

2017-05-06 22:16:23 Allen

令人驚歎！謝謝！ @Allen – Sheron

btw我怎麼能保持0行？ – Sheron

Python中提取新的數據幀

回答

相關問題