Hive：無法獲取在GROUP BY中不存在的列

我在蜂巢中有一個名爲purchase_data的表，它列出了所有購買的列表。
我需要查詢此表並查找客戶購買的最昂貴產品的cust_id，product_id和價格。
在purchase_data表中的數據是這樣的：Hive：無法獲取在GROUP BY中不存在的列

cust_id   product_id  price purchase_data 
-------------------------------------------------------- 
aiman_sarosh apple_iphone5s 55000 01-01-2014 
aiman_sarosh apple_iphone6s 65000 01-01-2017 
jeff_12   apple_iphone6s 65000 01-01-2017 
jeff_12   dell_vostro  70000 01-01-2017 
missy_el  lenovo_thinkpad 70000 01-02-2017

我已經寫了下面的代碼，但它不取右行。越來越重複
一些行：

select master.cust_id, master.product_id, master.price 
from 
(
    select cust_id, product_id, price 
    from purchase_data 
) as master 
join 
(
    select cust_id, max(price) as price 
    from purchase_data 
    group by cust_id 
) as max_amt_purchase 
on max_amt_purchase.price = master.price;

輸出：

aiman_sarosh apple_iphone6s 65000.0 
jeff_12   apple_iphone6s 65000.0 
jeff_12   dell_vostro  70000.0 
jeff_12   dell_vostro  70000.0 
missy_el  lenovo_thinkpad 70000.0 
missy_el  lenovo_thinkpad 70000.0 
Time taken: 21.666 seconds, Fetched: 6 row(s)

是不是有什麼毛病的代碼？

來源

2017-02-09 aiman

使用row_number()：

select pd.* 
from (select pd.*, 
      row_number() over (partition by cust_id order by price_desc) as seqnum 
     from purchase_data pd 
    ) pd 
where seqnum = 1;

這將返回一個排每cust_id，即使有關係。如果您在綁定時需要多行，請使用rank()或dense_rank()而不是row_number()。

來源

2017-02-09 12:25:47

感謝@Gordon，我改變了代碼，其工作。我已經發布瞭解決方案。 :) – aiman

@aiman。。。排名函數執行時使用連接和聚合會浪費資源，並且會使查詢更加複雜。 –

我改變了代碼，現在它的工作：

select master.cust_id, master.product_id, master.price 
from 
purchase_data as master, 
(
    select cust_id, max(price) as price 
    from purchase_data 
    group by cust_id 
) as max_price 
where master.cust_id=max_price.cust_id and master.price=max_price.price;

輸出：

aiman_sarosh apple_iphone6s 65000.0 
missy_el  lenovo_thinkpad 70000.0 
jeff_12   dell_vostro  70000.0 

Time taken: 55.788 seconds, Fetched: 3 row(s)

來源

2017-02-10 10:19:46 aiman

Hive：無法獲取在GROUP BY中不存在的列

回答

相關問題