2016-10-03 59 views
1

我有一個數據幀df1,並且我有一個包含多列df1名稱的列表。使用isin獲取列表中的數據幀列表

df1: 
User_id month day Age year CVI ZIP sex wgt 
0   1 7 16 1977  2  NA M NaN 
1   2 7 16 1977  3  NA M NaN 
2   3 7 16 1977  2  DM F NaN 
3   4 7 16 1977  7  DM M NaN 
4   5 7 16 1977  3  DM M NaN 
...  ... ... ... ... ...  ... ... ... 
35544  35545 12 31 2002 15  AH NaN NaN 
35545  35546 12 31 2002 15  AH NaN NaN 
35546  35547 12 31 2002 10  RM F 14 
35547  35548 12 31 2002  7  DO M 51 
35548  35549 12 31 2002  5  NaN NaN NaN 

list= [u"User_id", u"day", u"ZIP", u"sex"] 

我想打一個新的數據幀df2將omly包含那些列表中的列和數據幀df3其中將包含不在列表中的列。

Here我發現,我需要做的:

df2=df1[df1[df1.columns[1]].isin(list)] 

但作爲一個結果,我得到:

Empty DataFrame 
Columns: [] 
Index: [] 
[0 rows x 9 columns] 

什麼即時我odoing錯了,我怎樣才能得到一個需要的結果?爲什麼「9列」如果被推定爲4?

+1

很抱歉,但你'DF2 = DF1 [名單]'在第一種情況後?對於其他你可以做'df3 = df1 [df1.columns [〜df1.columns.isin(list)]]' – EdChum

+0

就是這樣,謝謝! – Polly

回答

1

解決方案與Index.difference

L = [u"User_id", u"day", u"ZIP", u"sex"] 

df2 = df1[L] 
df3 = df1[df1.columns.difference(df2.columns)] 
print (df2) 
    User_id day ZIP sex 
0  0 7 NaN M 
1  1 7 NaN M 
2  2 7 DM F 
3  3 7 DM M 
4  4 7 DM M 

print (df3) 
    Age CVI month wgt year 
0 16 2  1 NaN 1977 
1 16 3  2 NaN 1977 
2 16 2  3 NaN 1977 
3 16 7  4 NaN 1977 
4 16 3  5 NaN 1977 

或者:

df2 = df1[L] 
df3 = df1[df1.columns.difference(pd.Index(L))] 
print (df2) 
    User_id day ZIP sex 
0  0 7 NaN M 
1  1 7 NaN M 
2  2 7 DM F 
3  3 7 DM M 
4  4 7 DM M 

print (df3) 
    Age CVI month wgt year 
0 16 2  1 NaN 1977 
1 16 3  2 NaN 1977 
2 16 2  3 NaN 1977 
3 16 7  4 NaN 1977 
4 16 3  5 NaN 1977 
1

你可以試試:

df2 = df1[list] # it does a projection on the columns contained in the list 
df3 = df1[[col for col in df1.columns if col not in list]] 
+1

您是否認爲'df3 = df1 [[col for df1.columns in col in list not]]'? – jezrael

1

從未命名列表作爲 「列表」

my_list= [u"User_id", u"day", u"ZIP", u"sex"] 
df2 = df1[df1.keys()[df1.keys().isin(my_list)]] 
1

從未命名列表作爲 「列表」

my_list= [u"User_id", u"day", u"ZIP", u"sex"] 
df2 = df1[df1.keys()[df1.keys().isin(my_list)]] 

df2 = df1[df1.columns[df1.columns.isin(my_list)]]