2016-07-07 54 views
3

我想比較兩個不同長度的pandas DataFrame並確定匹配的索引號。當值匹配時,我想在新列中標記這些值。新列中的數據幀之間的標誌相似性

df1: 
Index Column 1 
41660 Apple 
41935 Banana 
42100 Strawberry 
42599 Pineapple 

df2: 
Index Column 1 
42599 Pineapple 

Output: 
Index Column 1 'Matching Index?' 
41660 Apple 
41935 Banana 
42100 Strawberry 
42599 Pineapple True 
+0

的可能的複製[比較兩列兩個Python Pandas數據框並獲取常用行](http://stackoverflow.com/questions/30291032/comparing-2-columns-of-two-python-pandas-dataframes-and-getting-the-common - ) – Andy

回答

4

如果這些真的是指數,那麼你可以在指數使用intersection

In [61]: 
df1.loc[df1.index.intersection(df2.index), 'flag'] = True 
df1 

Out[61]: 
     Column 1 flag 
Index     
41660  Apple NaN 
41935  Banana NaN 
42100 Strawberry NaN 
42599 Pineapple True 

否則使用isin

In [63]: 
df1.loc[df1['Index'].isin(df2['Index']), 'flag'] = True 
df1 

Out[63]: 
    Index Column 1 flag 
0 41660  Apple NaN 
1 41935  Banana NaN 
2 42100 Strawberry NaN 
3 42599 Pineapple True 
+1

謝謝,這解決了我的問題。 – zbug

2

+1到@ EdChum的答案。如果你可以在你的匹配列不同的值,True住嘗試:

>>> df1.merge(df2,how='outer',indicator='Flag') 
    Index  Column  Flag 
0 41660  Apple left_only 
1 41935  Banana left_only 
2 42100 Strawberry left_only 
3 42599 Pineapple  both 
2

使用ISIN() - 方法:

import pandas as pd 

df1 = pd.DataFrame(data=[ 
    [41660, 'Apple'], 
    [41935, 'Banana'], 
    [42100, 'Strawberry'], 
    [42599, 'Pineapple'], 
         ] 
        , columns=['Index', 'Column 1']) 

df2 = pd.DataFrame(data=[ 
    [42599, 'Pineapple'], 
         ] 
        , columns=['Index', 'Column 1']) 

df1['Matching'] = df1['Index'].isin(df2['Index']) 
print(df1) 

輸出:

Index Column 1 Matching 
0 41660  Apple False 
1 41935  Banana False 
2 42100 Strawberry False 
3 42599 Pineapple  True 
+1

'isin'已經在我的回答中提及 – EdChum