我已經從sparkSQL創建兩個數據幀時:pyspark:AnalysisException接合兩個數據幀
df1 = sqlContext.sql(""" ...""")
df2 = sqlContext.sql(""" ...""")
我試圖加入在柱上my_id
這兩個數據幀象下面這樣:
from pyspark.sql.functions import col
combined_df = df1.join(df2, col("df1.my_id") == col("df2.my_id"), 'inner')
然後我得到了以下錯誤。任何想法我錯過了什麼?謝謝!
AnalysisException Traceback (most recent call last)
<ipython-input-11-45f5313387cc> in <module>()
3 from pyspark.sql.functions import col
4
----> 5 combined_df = df1.join(df2, col("df1.my_id") == col("df2.my_id"), 'inner')
6 combined_df.take(10)
/usr/local/spark-latest/python/pyspark/sql/dataframe.py in join(self, other, on, how)
770 how = "inner"
771 assert isinstance(how, basestring), "how should be basestring"
--> 772 jdf = self._jdf.join(other._jdf, on, how)
773 return DataFrame(jdf, self.sql_ctx)
774
/usr/local/spark-latest/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py in __call__(self, *args)
1131 answer = self.gateway_client.send_command(command)
1132 return_value = get_return_value(
-> 1133 answer, self.gateway_client, self.target_id, self.name)
1134
1135 for temp_arg in temp_args:
/usr/local/spark-latest/python/pyspark/sql/utils.py in deco(*a, **kw)
67 e.java_exception.getStackTrace()))
68 if s.startswith('org.apache.spark.sql.AnalysisException: '):
---> 69 raise AnalysisException(s.split(': ', 1)[1], stackTrace)
70 if s.startswith('org.apache.spark.sql.catalyst.analysis'):
71 raise AnalysisException(s.split(': ', 1)[1], stackTrace)
AnalysisException: "cannot resolve '`df1.my_id`' given input columns: [...