2017-02-09 99 views
0

我在蜂巢中有一個名爲purchase_data的表,它列出了所有購買的列表。
我需要查詢此表並查找客戶購買的最昂貴產品的cust_id,product_id和價格。
purchase_data表中的數據是這樣的:Hive:無法獲取在GROUP BY中不存在的列

cust_id   product_id  price purchase_data 
-------------------------------------------------------- 
aiman_sarosh apple_iphone5s 55000 01-01-2014 
aiman_sarosh apple_iphone6s 65000 01-01-2017 
jeff_12   apple_iphone6s 65000 01-01-2017 
jeff_12   dell_vostro  70000 01-01-2017 
missy_el  lenovo_thinkpad 70000 01-02-2017 

我已經寫了下面的代碼,但它不取右行。越來越重複
一些行:

select master.cust_id, master.product_id, master.price 
from 
(
    select cust_id, product_id, price 
    from purchase_data 
) as master 
join 
(
    select cust_id, max(price) as price 
    from purchase_data 
    group by cust_id 
) as max_amt_purchase 
on max_amt_purchase.price = master.price; 

輸出:

aiman_sarosh apple_iphone6s 65000.0 
jeff_12   apple_iphone6s 65000.0 
jeff_12   dell_vostro  70000.0 
jeff_12   dell_vostro  70000.0 
missy_el  lenovo_thinkpad 70000.0 
missy_el  lenovo_thinkpad 70000.0 
Time taken: 21.666 seconds, Fetched: 6 row(s) 

是不是有什麼毛病的代碼?

回答

0

使用row_number()

select pd.* 
from (select pd.*, 
      row_number() over (partition by cust_id order by price_desc) as seqnum 
     from purchase_data pd 
    ) pd 
where seqnum = 1; 

這將返回一個排每cust_id,即使有關係。如果您在綁定時需要多行,請使用rank()dense_rank()而不是row_number()

+0

感謝@Gordon,我改變了代碼,其工作。我已經發布瞭解決方案。 :) – aiman

+0

@aiman。 。 。排名函數執行時使用連接和聚合會浪費資源,並且會使查詢更加複雜。 –

0

我改變了代碼,現在它的工作:

select master.cust_id, master.product_id, master.price 
from 
purchase_data as master, 
(
    select cust_id, max(price) as price 
    from purchase_data 
    group by cust_id 
) as max_price 
where master.cust_id=max_price.cust_id and master.price=max_price.price; 

輸出:

aiman_sarosh apple_iphone6s 65000.0 
missy_el  lenovo_thinkpad 70000.0 
jeff_12   dell_vostro  70000.0 

Time taken: 55.788 seconds, Fetched: 3 row(s)