2008-09-17 188 views
24

什麼是在2005年MSSQL計算百分位排名(例如第90百分位或中位數分數)的最好方法?計算百分位排名在MS SQL

我想能夠選擇第25,中位數和第75對得分的單個列百分位數(優選在一個單一的記錄,所以我可以平均,最大,最小和結合)。因此,例如,結果表輸出可能是:

Group MinScore MaxScore AvgScore pct25 median pct75 
----- -------- -------- -------- ----- ------ ----- 
T1  52  96  74  68  76  84 
T2  48  98  74  68  75  85 

回答

0

我可能會使用的SQL Server 2005的

ROW_NUMBER()以上(按分數順序)/(SELECT COUNT(* )從分數)

或類似的規定。

0

我會做這樣的事情:

select @n = count(*) from tbl1 
select @median = @n/2 
select @p75 = @n * 3/4 
select @p90 = @n * 9/10 

select top 1 score from (select top @median score from tbl1 order by score asc) order by score desc 

這是正確的?

14

我會認爲,這將是最簡單的解決方案:

SELECT TOP N PERCENT FROM TheTable ORDER BY TheScore DESC 

其中n =(100 - 期望的百分位數)。所以如果你想要第90個百分點的所有行,你可以選擇前10%。

我不知道你所說的「最好在一個單一的記錄」的意思。你的意思是計算單個記錄的給定分數的哪個百分位數會落入?例如你是否希望能夠做出這樣的表述:「你的分數是83,這使你處於第91百分位。」 ?

編輯:好的,我想一些關於你的問題,這種解釋提出了。你在問如何計算特定百分點的截止分數?例如這樣的事情:要在第90百分位,你必須有大於78分。

如果是這樣,這個查詢的作品。儘管我不喜歡子查詢,所以根據它的用途,我可能會嘗試找到更優雅的解決方案。但是,它確實會以單一分數返回單個記錄。

-- Find the minimum score for all scores in the 90th percentile 
SELECT Min(subq.TheScore) FROM 
(SELECT TOP 10 PERCENT TheScore FROM TheTable 
ORDER BY TheScore DESC) AS subq 
1

我已經工作多一點這個,這裏是我想出迄今:

CREATE PROCEDURE [dbo].[TestGetPercentile] 

@percentile as float, 
@resultval as float output 

AS 

BEGIN 

WITH scores(score, prev_rank, curr_rank, next_rank) AS (
    SELECT dblScore, 
     (ROW_NUMBER() OVER (ORDER BY dblScore) - 1.0)/((SELECT COUNT(*) FROM TestScores) + 1) [prev_rank], 
     (ROW_NUMBER() OVER (ORDER BY dblScore) + 0.0)/((SELECT COUNT(*) FROM TestScores) + 1) [curr_rank], 
     (ROW_NUMBER() OVER (ORDER BY dblScore) + 1.0)/((SELECT COUNT(*) FROM TestScores) + 1) [next_rank] 
    FROM TestScores 
) 

SELECT @resultval = (
    SELECT TOP 1 
    CASE WHEN t1.score = t2.score 
     THEN t1.score 
    ELSE 
     t1.score + (t2.score - t1.score) * ((@percentile - t1.curr_rank)/(t2.curr_rank - t1.curr_rank)) 
    END 
    FROM scores t1, scores t2 
    WHERE (t1.curr_rank = @percentile OR (t1.curr_rank < @percentile AND t1.next_rank > @percentile)) 
     AND (t2.curr_rank = @percentile OR (t2.curr_rank > @percentile AND t2.prev_rank < @percentile)) 
) 

END 

然後在另一個存儲過程中我這樣做:

DECLARE @pct25 float; 
DECLARE @pct50 float; 
DECLARE @pct75 float; 

exec SurveyGetPercentile .25, @pct25 output 
exec SurveyGetPercentile .50, @pct50 output 
exec SurveyGetPercentile .75, @pct75 output 

Select 
    min(dblScore) as minScore, 
    max(dblScore) as maxScore, 
    avg(dblScore) as avgScore, 
    @pct25 as percentile25, 
    @pct50 as percentile50, 
    @pct75 as percentile75 
From TestScores 

它仍然沒有做我想要的。這將得到所有測試的統計數據;而我希望能夠從TestScores中選擇具有多個不同測試的表格,並獲取每個不同測試的相同統計信息(就像我在我的問題的示例表中那樣)。

1

第50百分位與中位數相同。在計算其他百分位數時,比如說第80位,按照升序排序80%數據的數據,其他百分數按降序排序,並取兩個中間值的平均值。

注:平均查詢已經存在了很長一段時間,但不記得到底在哪我而來,我只修改它來計算其他百分得到它。

DECLARE @Temp TABLE(Id INT IDENTITY(1,1), DATA DECIMAL(10,5)) 

INSERT INTO @Temp VALUES(0) 
INSERT INTO @Temp VALUES(2) 
INSERT INTO @Temp VALUES(8) 
INSERT INTO @Temp VALUES(4) 
INSERT INTO @Temp VALUES(3) 
INSERT INTO @Temp VALUES(6) 
INSERT INTO @Temp VALUES(6) 
INSERT INTO @Temp VALUES(6) 
INSERT INTO @Temp VALUES(7) 
INSERT INTO @Temp VALUES(0) 
INSERT INTO @Temp VALUES(1) 
INSERT INTO @Temp VALUES(NULL) 


--50th percentile or median 
SELECT ((
     SELECT TOP 1 DATA 
     FROM (
       SELECT TOP 50 PERCENT DATA 
       FROM @Temp 
       WHERE DATA IS NOT NULL 
       ORDER BY DATA 
       ) AS A 
     ORDER BY DATA DESC) + 
     (
     SELECT TOP 1 DATA 
     FROM (
       SELECT TOP 50 PERCENT DATA 
       FROM @Temp 
       WHERE DATA IS NOT NULL 
       ORDER BY DATA DESC 
       ) AS A 
     ORDER BY DATA ASC))/2.0 


--90th percentile 
SELECT ((
     SELECT TOP 1 DATA 
     FROM (
       SELECT TOP 90 PERCENT DATA 
       FROM @Temp 
       WHERE DATA IS NOT NULL 
       ORDER BY DATA 
       ) AS A 
     ORDER BY DATA DESC) + 
     (
     SELECT TOP 1 DATA 
     FROM (
       SELECT TOP 10 PERCENT DATA 
       FROM @Temp 
       WHERE DATA IS NOT NULL 
       ORDER BY DATA DESC 
       ) AS A 
     ORDER BY DATA ASC))/2.0 


--75th percentile 
SELECT ((
     SELECT TOP 1 DATA 
     FROM (
       SELECT TOP 75 PERCENT DATA 
       FROM @Temp 
       WHERE DATA IS NOT NULL 
       ORDER BY DATA 
       ) AS A 
     ORDER BY DATA DESC) + 
     (
     SELECT TOP 1 DATA 
     FROM (
       SELECT TOP 25 PERCENT DATA 
       FROM @Temp 
       WHERE DATA IS NOT NULL 
       ORDER BY DATA DESC 
       ) AS A 
     ORDER BY DATA ASC))/2.0 
9

查看NTILE命令 - 它會給你百分點很容易!

SELECT SalesOrderID, 
    OrderQty, 
    RowNum = Row_Number() OVER(Order By OrderQty), 
    Rnk = RANK() OVER(ORDER BY OrderQty), 
    DenseRnk = DENSE_RANK() OVER(ORDER BY OrderQty), 
    NTile4 = NTILE(4) OVER(ORDER BY OrderQty) 
FROM Sales.SalesOrderDetail 
WHERE SalesOrderID IN (43689, 63181) 
+4

除了NTILE不給百分... – 2013-10-24 02:40:57

2

如何:

SELECT 
    Group, 
    75_percentile = MAX(case when NTILE(4) OVER(ORDER BY score ASC) = 3 then score else 0 end), 
    90_percentile = MAX(case when NTILE(10) OVER(ORDER BY score ASC) = 9 then score else 0 end)  
FROM TheScore 
GROUP BY Group