如果我理解你的問題正確,您試圖進行以下操作:
UPDATE表-A A,表-B B設置A.col3 = B.col3 WHERE A.col1 = B.col1;在數據框上。如果不存在於B中,則0。 (參看註釋)
a = [("a",1,100),("b",2,300),("c",3,500),("d",4,700)]
b = [("a",150),("b",350),("d",650)]
df_a = spark.createDataFrame(a,["col1","col2","col3"])
df_b = spark.createDataFrame(b,["col1","col3"])
df_a.show()
# +----+----+----+
# |col1|col2|col3|
# +----+----+----+
# | a| 1| 100|
# | b| 2| 300|
# | c| 3| 500|
# | d| 4| 700|
# +----+----+----+
df_b.show() # I have removed an entry for the purpose of the demo.
# +----+----+
# |col1|col3|
# +----+----+
# | a| 150|
# | b| 350|
# | d| 650|
# +----+----+
你需要執行outer join
後跟一個3210:
from pyspark.sql import functions as F
df_a.withColumnRenamed('col3','col3_a') \
.join(df_b.withColumnRenamed('col3','col3_b'), on='col1', how='outer') \
.withColumn("col3", F.coalesce('col3_b', F.lit(0))) \
.drop(*['col3_a','col3_b']).show()
# +----+----+----+
# |col1|col2|col3|
# +----+----+----+
# | d| 4| 650|
# | c| 3| 0|
# | b| 2| 350|
# | a| 1| 150|
# +----+----+----+
什麼是你需要的最終結果。我不太清楚我的理解。 – eliasah
它更像這樣的sql語句:UPDATE table_a A,table_b B SET A.col3 = B.col3 WHERE A.col1 = B.col1;在數據框上。如果不存在於B中,則爲0 – Viv