SQL中使用Postgres進行復雜排名

我對於複雜排名函數所需的SQL很感興趣。這是一款適用於賽車運動的應用程序，我需要根據條目的:total_time對每個Entry進行排名Timesheet。SQL中使用Postgres進行復雜排名

相關機型：

class Timesheet 
    has_many :entries 
end 

class Entry 
    belongs_to :timesheet 
    belongs_to :athlete 
end 

class Run 
    belongs_to :entry 
end

條目的:total time不存儲在數據庫中。這是一個計算列runs.sum(:finish)。我使用Postgres（9.3）rank()函數獲取給定時間表的條目，並按計算出的列對它們進行排名。

def ranked_entries 
    Entry.find_by_sql([ 
    "SELECT *, rank() OVER (ORDER BY total_time asc) 
    FROM(
     SELECT Entries.id, Entries.timesheet_id, Entries.athlete_id, 
     SUM(Runs.finish) AS total_time 
     FROM Entries 
     INNER JOIN Runs ON (Entries.id = Runs.entry_id) 
     GROUP BY Entries.id) AS FinalRanks 
     WHERE timesheet_id = ?", self.id]) 
end

到目前爲止好。這會返回具有rank屬性的我的輸入對象，我可以在timesheet#show上顯示該屬性。

現在棘手的部分。在Timesheet，並非每個Entry將具有相同的運行次數。有一個截止點（通常是前20名，但並不總是）。這使得Postgres的rank（）不準確，因爲一些參賽者比競賽獲勝者有更低的:total_time，因爲他們沒有爲第二次高潮做出決定。

我的問題：是否有可能像做一個rank()內的rank()產生一個表，看起來像下面的一個？還是有另一種首選的方式？謝謝！

注：我店倍整數，但我格式化它們作爲比較熟悉MM：在簡化見下表SS爲清楚起見

| rank | entry_id | total_time | 
|------|-----------|------------| 
| 1 |  6  | 1:59.05 | 
| 2 |  3  | 1:59.35 | 
| 3 |  17 | 1:59.52 | 
|......|...........|............| 
| 20 |  13 |  56.56 | <- didn't make the top-20 cutoff, only has one run.

來源

2015-04-05 jktress

這聽起來像你不應該擺在首位來選擇所有行（所有運行？）。如果你選擇了正確的行 - 一個將排除所有隻有一次運行的條目的選擇 - 那麼rank（）將返回你期望的結果。在你的問題的上下文中，我想我會說，首選的方法是選擇正確的行* first *，之後排名非常簡單。 – 2015-04-05 19:20:35

我選擇所有行，因爲我想包括在排名中只有一次運行的條目。無論運行次數如何，每個條目都需要進行排名。排名前20的球隊是根據total_time排名的，而21球隊的排名是他們首輪比賽的結束時間。 – jktress 2015-04-05 19:24:15

只需對平均值進行排名而不是總數呢？ – 2015-04-05 19:45:27

讓我們創建一個表。（獲取包括CREATE在所有SQL問題TABLE和INSERT語句的習慣。）

create table runs (
    entry_id integer not null, 
    run_num integer not null 
    check (run_num between 1 and 3), 
    run_time interval not null 
); 

insert into runs values 
(1, 1, '00:59.33'), 
(2, 1, '00:59.93'), 
(3, 1, '01:03.27'), 
(1, 2, '00:59.88'), 
(2, 2, '00:59.27');

此SQL語句會給你你想要的順序總數，但沒有排名他們。

with num_runs as (
    select entry_id, count(*) as num_runs 
    from runs 
    group by entry_id 
) 
select r.entry_id, n.num_runs, sum(r.run_time) as total_time 
from runs r 
inner join num_runs n on n.entry_id = r.entry_id 
group by r.entry_id, n.num_runs 
order by num_runs desc, total_time asc

 
entry_id num_runs total_time 
-- 
2   2   00:01:59.2 
1   2   00:01:59.21 
3   1   00:01:03.27

此語句級別添加一列。

with num_runs as (
    select entry_id, count(*) as num_runs 
    from runs 
    group by entry_id 
) 
select 
    rank() over (order by num_runs desc, sum(r.run_time) asc), 
    r.entry_id, n.num_runs, sum(r.run_time) as total_time 
from runs r 
inner join num_runs n on n.entry_id = r.entry_id 
group by r.entry_id, n.num_runs 
order by rank asc

 
rank entry_id num_runs total_time 
-- 
1  2   2   00:01:59.2 
2  1   2   00:01:59.21 
3  3   1   00:01:03.27

來源

2015-04-05 19:48:00

謝謝邁克！我會試驗這個，並讓你知道它是如何發生的。 – jktress 2015-04-05 20:03:50

精美的作品，非常感謝！一個跟進。這將獲得數據庫中的所有運行，但我只需要運行一個時間表。你會如何推薦我限制運行到他們的entry.timesheet？再次感謝！ – jktress 2015-04-05 20:48:09

是的，沿着這些線。在公用表格表達式中，您一定需要這樣做，以便爲時間表中的每個條目獲取正確的運行次數。根據entry_id和timeheet_id的關聯方式，您可能也需要在主查詢中使用相同的WHERE子句。或者，您可以在CTE中包含timesheet_id，並在主查詢中加入entry_id *和* timesheet_id。 – 2015-04-05 21:22:25

SQL中使用Postgres進行復雜排名

回答

相關問題