2016-04-02 20 views
5

下面是表:蜂巢查找開始和集團的結束或更改點

+------+------+ 
| Name | Time | 
+------+------+ 
| A | 1 | 
| A | 2 | 
| A | 3 | 
| A | 4 | 
| B | 5 | 
| B | 6 | 
| A | 7 | 
| B | 8 | 
| B | 9 | 
| B | 10 | 
+------+------+ 

我想編寫一個查詢來獲得:

+-------+--------+-----+ 
| Name | Start | End | 
+-------+--------+-----+ 
| A  |  1 | 4 | 
| B  |  5 | 6 | 
| A  |  7 | 7 | 
| B  |  8 | 10 | 
+-------+--------+-----+ 

有誰知道怎麼辦呢?

+3

這就是所謂的島嶼問題。我不知道'HIVE',但在'SQL Server'中我們使用窗口函數來解決它。這裏是** [DEMO](http://www.sqlfiddle.com/#!3/9eecb7db59d16c80417c72d1/6243)** –

+0

非常感謝! – GoGoGo

回答

0

這不是最有效的方式,但它的工作原理。

SELECT name, min(time) AS start,max(time) As end 
FROM (
    SELECT name,time, time- DENSE_RANK() OVER (partition by name ORDER BY 
    time) AS diff 
    FROM foo 
) t 
GROUP BY name,diff; 

我建議嘗試以下查詢,並建立一個GenericUDF找出差距,更容易:)

SELECT name, sort_array(collect_list(time)) FROM foo GROUP BY name;