2017-05-28 170 views
0

我將excel中的數據加載到熊貓數據框中。我現在只希望僅選擇其ASSESSMENT ID是每個APPID的最大ASSESSMENT ID以及該APPID的所有UI SEQ ID的行。根據python熊貓中的2列選擇DF中的特定行

APPID APPNAME ASSESSMENT ID UI SEQ NUMBER QUESTION ANSWER TEXT . 
1 appname 2493 11 Question No . 
1 appname 13808 11 Question Ctry of domicile . 
1 appname 13808 11 Question Name . 
1 appname 35316 11 Question Ctry of domicile .  
1 appname 35316 11 Question Name . 
1 appname 35316 11 Question Nationality .  
1 appname 2493 12 Question Corp name . 
1 appname 2493 12 Question Cr Br Scr . 
1 appname 2493 12 Question Inc And Assests . 
1 appname 2493 12 Question Int, Ext Reg Reports . 
1 appname 13808 12 Question Corp name . 
1 appname 35316 12 Question Corp name . 
1 appname 2493 13 Question No . 
1 appname 13808 13 Question No . 
1 appname 35316 13 Question No . 
1 appname 2493 14 Question No . 
1 appname 13808 14 Question firms Pos . 
1 appname 35316 14 Question firms Pos . 

其結果將是

APPID APPNAME ASSESSMENT ID UI SEQ NUMBER QUESTION ANSWER TEXT . 
1 appname 35316 11 Question Ctry of domicile . 
1 appname 35316 11 Question Name . 
1 appname 35316 11 Question Nationality . 
1 appname 35316 12 Question Corp name . 
1 appname 35316 13 Question No . 
1 appname 35316 14 Question firms Pos . 
+0

請[不要張貼圖像的代碼(或鏈接到他們)](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-所以當問一個問題) – jezrael

+0

道歉張貼圖像,但沒有其他方式,我可以從excel發佈數據到這裏沒有適當的格式 – vivek

+0

嗯,如果複製粘貼並添加4個空格前,它不會每行工作? – jezrael

回答

1

我認爲你需要boolean indexingapply創建面膜:

df1 = df[df.groupby(['APPID', 'UI SEQ NUMBER'])['ASSESSMENT ID'].apply(lambda x:x==x.max())] 
print (df1) 
    APPID APPNAME ASSESSMENT ID UI SEQ NUMBER QUESTION  ANSWER TEXT. 
3  1 appname   35316    11 Question Ctry of domicile. 
4  1 appname   35316    11 Question    Name. 
5  1 appname   35316    11 Question  Nationality. 
11  1 appname   35316    12 Question   Corp name. 
14  1 appname   35316    13 Question    No. 
17  1 appname   35316    14 Question   firms Pos. 

或者,如果不需要的所有重複值使用idxmax

df1 = df.loc[df.groupby(['APPID', 'UI SEQ NUMBER'])['ASSESSMENT ID'].idxmax()] 
print (df1) 
    APPID APPNAME ASSESSMENT ID UI SEQ NUMBER QUESTION  ANSWER TEXT. 
3  1 appname   35316    11 Question Ctry of domicile. 
11  1 appname   35316    12 Question   Corp name. 
14  1 appname   35316    13 Question    No. 
17  1 appname   35316    14 Question   firms Pos. 
+0

完美jezrael。那解決了它。我正在執行以下-df [df.groupby(['APPID','UI SEQ SEQUMBERS'])['ASSESSMENT ID']。max() – vivek

+0

那麼最好使用'df1 = df.loc [df.groupby( ['APPID','UI SEQ ID NUMBER'])['ASSESSMENT ID']。idxmax()]' – jezrael

相關問題