不幸的是,我不認爲有內置的方法來做到這一點。 pandas
連接相當有限,因爲基本上只能測試左列與右列的相等性,而不像SQL更一般。
雖然可以通過形成交叉產品然後檢查所有相關條件來做到這一點。它因此消耗了一些內存,但它不應該太低效。
注意我稍微改變了你的測試用例,使它們更一般化,並將變量重命名爲更直觀的東西。
import pandas as pd
from functools import reduce
dataA = [1, 2]
dfA = pd.DataFrame(dataA)
dfA.columns = ['col']
dataB = [(1, None, 1), (None, 2, None), (1, 2, None)]
dfB = pd.DataFrame(dataB)
dfB.columns = ['col1', 'col2', 'col3']
print(dfA)
print(dfB)
def cross(left, right):
"""Returns the cross product of the two dataframes, keeping the index of the left"""
# create dummy columns on the dataframes that will always match in the merge
left["_"] = 0
right["_"] = 0
# merge, keeping the left index, and dropping the dummy column
result = left.reset_index().merge(right, on="_").set_index("index").drop("_", axis=1)
# drop the dummy columns from the mutated dataframes
left.drop("_", axis=1, inplace=True)
right.drop("_", axis=1, inplace=True)
return result
def merge_left_in_right(left_df, right_df):
"""Return the join of the two dataframes where the element of the left dataframe's column
is in one of the right dataframe's columns"""
left_col, right_cols = left_df.columns[0], right_df.columns
result = cross(left_df, right_df) # form the cross product with a view to filtering it
# a row must satisfy one of the following conditions:
tests = (result[left_col] == result[right_col] for right_col in right_cols)
# form the disjunction of the conditions
left_in_right = reduce(lambda left_bools, right_bools: left_bools | right_bools, tests)
# return the appropriate rows
return result[left_in_right]
print(merge_left_in_right(dfA, dfB))
我假設第三個數據幀並不完全是你想要的。你能嘲笑一個正是你想要的數據框嗎? –
@PaulH實際上,如果應用* ignore_index = True *和* .drop_duplicates()*來消除左列值與右列值相匹配時發生的重複行, –
那麼問題是什麼?好像你有你的答案。 –