2016-12-27 72 views
2

我有一個漂亮的標準「僅追加」表,其中created_atgroup_name爲使用Amazon Redshift的列。SQL:排名/按總排名篩選

我想在過去的[時間範圍]中按組生成時間序列的前N行。

目前我使用這個:

SELECT 
    date_trunc('day', created_at) AS timeseries, 
    my_table.group_name, 
    COUNT(*) AS count 
FROM 
    my_table 
JOIN (
    SELECT 
     group_name, 
     ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS rank 
    FROM 
     my_table 
    WHERE 
     created_at > (CURRENT_DATE - INTERVAL '1 days') 
    GROUP BY 
     group_name 
    ) ranking ON (ranking.group_name = my_table.group_name) 
WHERE 
    created_at > (CURRENT_DATE - INTERVAL '1 days') 
GROUP BY 
    timeseries, 
    my_table.group_name, 
    ranking.rank 
HAVING 
    ranking.rank <= 5 
ORDER BY 
    timeseries DESC 

這是很容易出錯的改變,因爲created_at範圍的過濾出現兩次,造成問題,如果它需要改變。

有沒有辦法使這個查詢更優雅(理想情況下使用時間過濾器只有一次)?

+0

你想每組前5行?這似乎選擇行數最多的前5個組。 – systemjack

回答

0

您可以添加連接條件的created_at,

例如計算最大值和最小值爲created_at和

SELECT 
    date_trunc('day', created_at) AS timeseries, 
    my_table.group_name, 
    COUNT(*) AS count 
FROM 
    my_table 
JOIN (
    SELECT 
     group_name, 
     max(created_at) as max_createed, 
     min(created_at) as min_createed, 
     ROW_NUMBER() OVER (ORDER BY COUNT(*) DESC) AS rank 
    FROM 
     my_table 
    WHERE 
     created_at > (CURRENT_DATE - INTERVAL '1 days') 
    GROUP BY 
     group_name 
    ) ranking ON (ranking.group_name = my_table.group_name) 
AND created_ad between min_created and max_created 
GROUP BY 
    timeseries, 
    my_table.group_name, 
    ranking.rank 
HAVING 
    ranking.rank <= 5 
ORDER BY 
    timeseries DESC 

之間也將所有的數據,我相信有更優雅的方式來計算如果沒有把同一個表兩次

+0

謹慎分享更多關於更優雅的方式? – LiraNuna

+0

因爲我需要更多的數據,例如,如果group_name是日常基礎上的唯一值? – user3600910

+0

不,這是一個BYTEDICT,總共保存的值小於40個 – LiraNuna

0

嘗試這一個,也搞壞應該更快

SELECT 
     ranking.date AS timeseries, 
     ranking.group_name, 
     COUNT(*) AS count 
    FROM 
     my_table 
    JOIN (
     SELECT 
      group_name, 
      date(created_at) as date, 
      ROW_NUMBER() OVER (PARTITION BY date(created_at) ORDER BY COUNT(*) DESC) AS rank 
     FROM 
      my_table 
     WHERE 
      created_at > (CURRENT_DATE - INTERVAL '1 days') 
     GROUP BY 
      group_name, 
      date(created_at) as date 
     ) ranking 
WHERE rank <=5 
GROUP BY 1,2 
+0

'ROW_NUMBER'不是很準確 - 如果時間序列中有下降,則前N將顯示正確 – LiraNuna

0

我不認爲我完全理解您的要求,但是這個查詢應該每天給出前5個組。

select timeseries, group_name, count from (
    select timeseries, group_name, count, 
     row_number() over (partition by timeseries order by count desc) as rank 
    from (
     select date_trunc('day', created_at) AS timeseries, 
      group_name, 
      count(*) AS count 
     from my_table 
     where created_at > sysdate - '1 day'::interval 
     group by 1,2 
    ) 
) where rank <= 5 
order by 1 desc 

這個查詢應該給每天的計數爲整體前5組:

with daily_counts as (
    select date_trunc('day', created_at) AS timeseries, 
     group_name, 
     count(*) AS count 
    from my_table 
    where created_at > sysdate - '1 day'::interval 
    group by 1,2 
) 
select d.timeseries, d.group_name, d.count 
from daily_counts d 
join (
    select group_name, sum(count) as total 
    from daily_counts 
    group by group_name order by total desc 
    limit 5 
) r on d.group_name=r.group_name 
order by 1,3 desc