4

我有一個MS SQL Server 2008數據庫,用於存儲供應食物的地方(咖啡廳,餐館,食客等)。在連接到這個數據庫的網站上,人們可以評分從1到3的比例。計算存儲過程中的加權(貝葉斯)平均分數/指數?

在網站上有一個頁面,人們可以查看排名前25的頂級名單(最好評級)某個城市。數據庫結構看起來像這樣(有表中存儲更多的信息,但這裏的相關信息): Database structure: Cities->Places->Votes

的地方坐落在一個城市和票放在一個地方。

到目前爲止,我剛剛計算了每個地方的平均投票分數,我將某個地方的所有選票總數與該地點的投票數相除,如下所示(僞代碼):

vote_count = total number of votes for the place 
vote_sum = total sum of all the votes for the place 

vote_score = vote_sum/vote_count 

如果一個地方沒有投票,我還必須處理除以零。所有這些都是在存儲過程中完成的,該存儲過程獲取我想要顯示在頂部列表中的其他數據。這裏是取前25位最高的投得分當前存儲過程:

ALTER PROCEDURE [dbo].[GetTopListByCity] 
    (
    @city_id Int 
    ) 
AS 
    SELECT TOP 25 dbo.Places.place_id, 
      dbo.Places.city_id, 
      dbo.Places.place_name, 
      dbo.Places.place_alias, 
      dbo.Places.place_street_address, 
      dbo.Places.place_street_number, 
      dbo.Places.place_zip_code, 
      dbo.Cities.city_name, 
      dbo.Cities.city_alias, 
      dbo.Places.place_phone, 
      dbo.Places.place_lat, 
      dbo.Places.place_lng, 
      ISNULL(SUM(dbo.Votes.vote_score),0) AS vote_sum, 
      (SELECT COUNT(*) FROM dbo.Votes WHERE dbo.Votes.place_id = dbo.Places.place_id) AS vote_count, 
      COALESCE((CONVERT(FLOAT,SUM(dbo.Votes.vote_score))/(CONVERT(FLOAT,(SELECT COUNT(*) FROM dbo.Votes WHERE dbo.Votes.place_id = dbo.Places.place_id)))),0) AS vote_score 

    FROM dbo.Places INNER JOIN dbo.Cities ON dbo.Places.city_id = dbo.Cities.city_id 
    LEFT OUTER JOIN dbo.Votes ON dbo.Places.place_id = dbo.Votes.place_id 
    WHERE dbo.Places.city_id = @city_id 
    AND dbo.Places.hidden = 0 
    GROUP BY dbo.Places.place_id, 
      dbo.Places.city_id, 
      dbo.Places.place_name, 
      dbo.Places.place_alias, 
      dbo.Places.place_street_address, 
      dbo.Places.place_street_number, 
      dbo.Places.place_zip_code, 
      dbo.Cities.city_name, 
      dbo.Cities.city_alias, 
      dbo.Places.place_phone, 
      dbo.Places.place_lat, 
      dbo.Places.place_lng 
    ORDER BY vote_score DESC, vote_count DESC, place_name ASC 

    RETURN 

正如你可以看到它獲取的不僅僅是投得分更多 - 我需要的地方去的數據,全市它位於等等。這工作正常,但有一個大問題:投票分數太簡單了,因爲它沒有考慮到投票數。與簡單的計算方法,它具有一票比分3將在列表中較有十四票比分3和比分2一票的地方結束了更高的地方:

3/1 = 3 
(14*3 + 1*2) = 44/15 = 2.933333333333 

要解決我一直在研究使用某種形式的加權平均/加權指數。我發現了一個看起來很有前途的真實貝葉斯估計的例子。它看起來像這樣:

weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C 

where: 

R = average for the place (mean) = (Rating) 
v = number of votes for the place = (votes) 
m = minimum number of votes required to be listed in the Top 25 (unsure how many, but somewhere between 2-5 seems realistic) 
C = the mean vote across the whole database 

的問題開始當我試圖實現這個加權評分在存儲過程 - 它很快變得複雜和我糾纏到的存儲過程做什麼括號和適度寬鬆的軌道。

現在我需要一些幫助的兩個問題:

這是用於計算加權指數爲我的網站的適當方法?

在存儲過程中實現時,此(或其他合適的計算方法)的外觀如何?

回答

1

我看不出任何問題與您的計算。但我可以看到你多次做同樣的事情。我的建議將幫助你在一個地方做聚合,然後選擇很容易。

;WITH CTE 
(
    SELECT 
     SUM(dbo.Votes.vote_score) AS SumOfVoteScore, 
     COUNT(*) AS CountOfVotes, 
     Votes.place_id 
    FROM 
     Votes 
    GROUP BY 
     Votes.place_id 
) 
SELECT TOP 25 
    dbo.Places.place_id, 
    dbo.Places.city_id, 
    dbo.Places.place_name, 
    dbo.Places.place_alias, 
    dbo.Places.place_street_address, 
    dbo.Places.place_street_number, 
    dbo.Places.place_zip_code, 
    dbo.Cities.city_name, 
    dbo.Cities.city_alias, 
    dbo.Places.place_phone, 
    dbo.Places.place_lat, 
    dbo.Places.place_lng, 
    ISNULL(CTE.SumOfVoteScore,0) AS vote_sum, 
    CTE.CountOfVotes AS vote_count, 
    COALESCE((CONVERT(FLOAT,CTE.SumOfVoteScore)/ 
    (CONVERT(FLOAT,CTE.CountOfVotes))),0) AS vote_score 

FROM dbo.Places INNER JOIN dbo.Cities ON dbo.Places.city_id = dbo.Cities.city_id 
LEFT JOIN CTE ON dbo.Places.place_id=CTE.place_id 
WHERE dbo.Places.city_id = @city_id 
AND dbo.Places.hidden = 0 
GROUP BY dbo.Places.place_id, 
     dbo.Places.city_id, 
     dbo.Places.place_name, 
     dbo.Places.place_alias, 
     dbo.Places.place_street_address, 
     dbo.Places.place_street_number, 
     dbo.Places.place_zip_code, 
     dbo.Cities.city_name, 
     dbo.Cities.city_alias, 
     dbo.Places.place_phone, 
     dbo.Places.place_lat, 
     dbo.Places.place_lng 
ORDER BY vote_score DESC, vote_count DESC, place_name ASC 

CTE函數幫助我們重新使用計算。所以我們不必使用SUM(vote_score)SELECT COUNT(*) FROM Votes WHERE...倍數。那麼當你選擇計算時很容易遵循。

我希望這有助於

編輯

您不必在CTE定義表列。這個CTE (SumOfVoteScore, CountOfVotes, place_id) AS的效果和CTE AS一樣好。如果您使用遞歸cte,則需要定義列。因爲你是union與其他部分。

僅供參考herehere您就會找到CTE功能

0

由於聯發一些信息!

我一直在尋找CTE的東西,但我只是不知道它是我在找的東西!學習新東西總是很好,我知道我會在其他項目中使用CTE。當我在存儲過程中實現你的CTE,我得到這個代碼:

ALTER PROCEDURE dbo.GetTopListByCityCTE 
    (
    @city_id Int 
    ) 
AS 

;WITH CTE (SumOfVoteScore, CountOfVotes, place_id) AS 
(
    SELECT 
     SUM(dbo.Votes.vote_score) AS SumOfVoteScore, 
     COUNT(*) AS CountOfVotes, 
     Votes.place_id 
    FROM 
     Votes 
    GROUP BY 
     Votes.place_id 

) 

SELECT TOP 25 
    dbo.Places.place_id, 
    dbo.Places.city_id, 
    dbo.Places.place_name, 
    dbo.Places.place_alias, 
    dbo.Places.place_street_address, 
    dbo.Places.place_street_number, 
    dbo.Places.place_zip_code, 
    dbo.Cities.city_name, 
    dbo.Cities.city_alias, 
    dbo.Places.place_phone, 
    dbo.Places.place_lat, 
    dbo.Places.place_lng, 
    ISNULL(CTE.SumOfVoteScore,0) AS vote_sum, 
    CTE.CountOfVotes AS vote_count, 
    COALESCE((CONVERT(FLOAT,CTE.SumOfVoteScore)/ 
    (CONVERT(FLOAT,CTE.CountOfVotes))),0) AS vote_score 

FROM dbo.Places INNER JOIN dbo.Cities ON dbo.Places.city_id = dbo.Cities.city_id 
LEFT JOIN CTE ON dbo.Places.place_id = CTE.place_id 
WHERE dbo.Places.city_id = @city_id 
AND dbo.Places.hidden = 0 
GROUP BY dbo.Places.place_id, 
     dbo.Places.city_id, 
     dbo.Places.place_name, 
     dbo.Places.place_alias, 
     dbo.Places.place_street_address, 
     dbo.Places.place_street_number, 
     dbo.Places.place_zip_code, 
     dbo.Cities.city_name, 
     dbo.Cities.city_alias, 
     dbo.Places.place_phone, 
     dbo.Places.place_lat, 
     dbo.Places.place_lng, 
     CTE.SumOfVoteScore, 
     CTE.CountOfVotes 
ORDER BY vote_score DESC, vote_count DESC, place_name ASC 

快速檢查表明,它返回相同的結果前面的代碼,但它更容易閱讀和遵守,並希望更有效。

現在我將不得不做一些試驗,用一個考慮票數的新票替換舊的(簡單的)評級計算方法。

+0

這樣做..高興地幫助你。如果你對我的回答沒問題,你可以考慮接受它? – Arion 2012-04-02 10:33:38

+0

而且如果你看到我的答案,我已經更新了它 – Arion 2012-04-02 10:44:06

+0

我只是想確保CTE幫助我解決原始問題(實現更復雜的分數索引),然後再將答案標記爲解決方案。我正在研究新的存儲過程... – tkahn 2012-04-02 10:47:42

0

好了 - 所以這裏是我想出了存儲過程:

ALTER PROCEDURE dbo.GetTopListByCityCTE 
    (
    @city_id Int 
    ) 
AS 

DECLARE @MinimumNumber float; 
DECLARE @TotalNumberOfVotes int; 
DECLARE @AverageRating float; 
DECLARE @AverageNumberOfVotes float; 

/* MINIMUM NUMBER */ 
SET @MinimumNumber = 1; 

/* TOTAL NUMBER OF VOTES -- ALL PLACES */ 
SET @TotalNumberOfVotes = (
    SELECT COUNT(*) FROM Votes 
); 

/* AVERAGE RATING -- ALL PLACES */ 
SET @AverageRating = (
    SELECT 
     CONVERT(FLOAT,(SUM(dbo.Votes.vote_score)))/CONVERT(FLOAT,COUNT(*)) AS AverageRating 
    FROM 
     Votes); 

/* AVERAGE NUMBER OF VOTES -- ALL PLACES */ 
/* CURRENTLY NOT USED IN INDEX - KEPT FOR REFERENCE */ 
SET @AverageNumberOfVotes = (
    SELECT AVG(CONVERT(FLOAT,NumberOfVotes)) FROM (SELECT COUNT(*) AS NumberOfVotes FROM Votes GROUP BY place_id) AS AverageNumberOfVotes 

); 
/* SUM OF ALL VOTE SCORES AND COUNT OF ALL VOTES -- INDIVIDUAL PLACES */ 
WITH CTE AS (
    SELECT 
     CONVERT(FLOAT, SUM(dbo.Votes.vote_score)) AS SumVotesForPlace, 
     CONVERT(FLOAT, COUNT(*)) AS CountVotesForPlace, 
     Votes.place_id 
    FROM 
     Votes 
    GROUP BY 
     Votes.place_id 
) 

SELECT 
    dbo.Places.place_id, 
    dbo.Places.city_id, 
    dbo.Places.place_name, 
    dbo.Places.place_alias, 
    dbo.Places.place_street_address, 
    dbo.Places.place_street_number, 
    dbo.Places.place_zip_code, 
    dbo.Cities.city_name, 
    dbo.Cities.city_alias, 
    dbo.Places.place_phone, 
    dbo.Places.place_lat, 
    dbo.Places.place_lng, 
    ISNULL(CTE.SumVotesForPlace,0) AS vote_sum, 
    ISNULL(CTE.CountVotesForPlace,0) AS vote_count, 
    COALESCE((CTE.SumVotesForPlace/ 
    CTE.CountVotesForPlace),0) AS vote_score, 
    ISNULL((CTE.CountVotesForPlace/(CTE.CountVotesForPlace + @MinimumNumber)) * (COALESCE((CTE.SumVotesForPlace/CTE.CountVotesForPlace),0)) + (@MinimumNumber/(CTE.CountVotesForPlace + @MinimumNumber)) * @AverageRating,0) AS WeightedIndex 

FROM dbo.Places INNER JOIN dbo.Cities ON dbo.Places.city_id = dbo.Cities.city_id 
LEFT JOIN CTE ON dbo.Places.place_id = CTE.place_id 
WHERE dbo.Places.city_id = @city_id 
AND dbo.Places.hidden = 0 
GROUP BY dbo.Places.place_id, 
     dbo.Places.city_id, 
     dbo.Places.place_name, 
     dbo.Places.place_alias, 
     dbo.Places.place_street_address, 
     dbo.Places.place_street_number, 
     dbo.Places.place_zip_code, 
     dbo.Cities.city_name, 
     dbo.Cities.city_alias, 
     dbo.Places.place_phone, 
     dbo.Places.place_lat, 
     dbo.Places.place_lng, 
     CTE.SumVotesForPlace, 
     CTE.CountVotesForPlace 
ORDER BY WeightedIndex DESC, vote_count DESC, place_name ASC 

有一個叫未在計算中使用@AverageNumberOfVotes變量,但我的情況下,保持它有參考它可能需要。

根據我所得到的數據運行這個結果,我得到的結果與之前的結果稍有不同,但它不是革命性的,並不是我所需要的。下面是當我執行上面的SP所返回的前10行:

vote_sum  vote_count vote_score   WeightedIndex 
1110   409   2,71393643031785 2,7140960047496 
807    310   2,60322580645161 2,60449697749787 
38    15   2,53333333333333 2,56708633093525 
25    10   2,5     2,55442722744881 
2    1   2     2,55188848920863 
2    1   2     2,55188848920863 
2    1   2     2,55188848920863 
2    1   2     2,55188848920863 
2    1   2     2,55188848920863 
2    1   2     2,55188848920863 

的問題在這裏似乎是,那裏只有一票,比分是2,加權指數成爲2,55188848920863?

計算該指數的計算公式是從IMDB(http://www.imdb.com/chart/top)拍攝的,我想,無論是我做錯了什麼,或者我有我的數據庫中的數據不具有可比性的數據(投票數或投票規模)IMDB有?

編輯

有如此工作對我來說更好,我可以調整這個功能的方法嗎?是否有不同的功能/方法可以更好地工作?我仍然需要在存儲過程中進行計算。

+0

我不知道這個公式(即IMDB所謂的「真正的貝葉斯估計」)是我所需要的,而且有批評:http://en.wikipedia.org/wiki/Bayes_estimator#Practical_example_of_misapplication_of_Bayes_estimators – tkahn 2012-04-02 13:50:29