Spark SQL分組：如果您不關心獲取哪個值，請將其添加到分組或第一個（）中。

我有這樣Spark SQL分組：如果您不關心獲取哪個值，請將其添加到分組或第一個（）中。

select count(ts), truncToHour(ts) 
from myTable 
group by truncToHour(ts).

哪裏ts是時間戳類型的星火SQL查詢，truncToHour是捨去時間戳小時UDF。此查詢不起作用。如果我嘗試，

select count(ts), ts from myTable group by truncToHour(ts)

我expression 'ts' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.;，但first()如果我做的不是定義：

select count(ts), first(ts) from myTable group by truncToHour(ts)

反正來得到我想要不使用子查詢？此外，它爲什麼說「包裝在第一（）」，但first()未定義？

來源

2015-07-09 Paul Z Wu

我得到了一個解決方案：

SELECT max(truncHour(ts)), COUNT(ts) FROM myTable GROUP BY truncHour(ts)

或

SELECT truncHour(max(ts)), count(ts) FROM myTable GROUP BY truncHour(ts)

有沒有更好的解決辦法？

來源

2015-07-09 22:33:19

是這個作品謝謝 –

https://issues.apache.org/jira/browse/SPARK-9210

似乎是實際功能爲FIRST_VALUE。

來源

2015-09-08 11:43:18

Spark SQL分組：如果您不關心獲取哪個值，請將其添加到分組或第一個（）中。

回答

相關問題