2017-08-31 414 views
0

我正在用View自己做一個View,然後嘗試選擇我創建的別名列。Spark SQL選擇別名UNION上的列

這裏是我的代碼有:

val mergedDF = sparkSession.sqlContext.sql(" SELECT COLUMN1 as COLUMN3 FROM MY_VIEW Union SELECT COLUMN2 as COLUMN3 from MY_VIEW") 
 

 
val mergedView = mergedDF.createOrReplaceTempView("MERGED_VIEW") 
 

 
val distinctColumnDF =sparkSession.sqlContext.sql(" SELECT distinct COLUMN3 from MERGED_VIEW WHERE node like '%city%') 
 

 
logger.debug("No.of Distinct City Rows="+distinctColumnDF.count());

我收到以下錯誤:

org.apache.spark.sql.catalyst.parser.ParseException: mismatched input 'from' expecting { <EOF>, 'WHERE' , 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT','SORT','CLUSTER','DISTRIBUTE' }(line 1 , pos 21) 
 

 
== SQL == 
 

 
SELECT distinct COLUMN3 from MERGED_VIEW where node like '%city%' 
 
---------------------^^^^

得到任何幫助。

感謝

回答

0

您可以在數據框使用別名方法,如:

val dfA = sparkSession.sqlContext.sql("SELECT COLUMN1 FROM MY_VIEW") 
val dfB = sparkSession.sqlContext.sql("SELECT COLUMN2 FROM MY_VIEW")  
val mergedDF = dfA.select(dfA.col("COLUMN1").alias("COLUMN3")).union(dfB.select(dfB.col("COLUMN2").alias("COLUMN3"))) 

val distinctColumnDF = mergedDF.filter(mergedDF.col("COLUMN3").contains("city")).distinct().collect() 

logger.debug("No.of Distinct Rows="+distinctColumnDF.count()); 
+0

否仍然收到同樣的錯誤: –

0

以下似乎是工作。

val dfA = sparkSession.sqlContext.sql("SELECT COLUMN1 FROM MY_VIEW") 
 
val dfB = sparkSession.sqlContext.sql("SELECT COLUMN2 FROM MY_VIEW")  
 
val mergedDF = dfA.select(dfA.col("COLUMN1").alias("COLUMN3")).union(dfB.select(dfB.col("COLUMN2").alias("COLUMN3"))) 
 

 
val str = mergedDF.select(mergedDF.col("COLUMN3")).filter(mergedDF.col("COLUMN3").contains("test")).distinct().count() 
 

 
logger.debug("No.of Distinct Rows="+str);

不知道爲什麼純粹的SQL語法沒有工作。

非常感謝FiagB的建議。