如何查詢註釋的stackoverflow樣式？

我在meta上看到了這個問題：https://meta.stackexchange.com/questions/33101/how-does-so-query-comments 如何查詢註釋的stackoverflow樣式？

我想直接設置記錄並以適當的技術方式提出問題。

說我有2個表：

 
Posts 
id 
content 
parent_id   (null for questions, question_id for answer) 

Comments 
id 
body 
is_deleted 
post_id 
upvotes 
date

注意：我認爲這是這樣的架構是如何設置，答案有PARENT_ID這是問題，問題都空在那裏。問題和答案存儲在同一個表中。

如何以最簡單的往返方式以非常有效的方式提取註釋stackoverflow樣式？

規則：

單個查詢應該拔出來呈現
僅需要拔出每個答案5個評論，與PREF爲upvotes
需要提供足夠的信息來通知用戶有更多的評論超出了5。（和實際的數量 - 例如2條評論）
排序對於評論來說真的很有趣，正如你可以在這個問題的評論中看到的那樣。規則是，按日期顯示評論，但是如果評論有積極的評價，那麼它將獲得優惠待遇並顯示在列表底部。（這在sql中很難表達）

如果有任何非規範化使它更好，它們是什麼？哪些指數非常重要？

來源

2009-12-16 Sam Saffron

@Mark：SO被設置爲在相同的表中存在問題和答案。 – 2009-12-16 23:03:10

SO有問題，答案和評論。什麼是「帖子」？他們有問題嗎？答案？都？我如何知道哪些帖子屬於哪個問題？ – 2009-12-16 23:03:56

@OMG小馬，好的我不知道。 – 2009-12-16 23:04:33

用途：

WITH post_hierarchy AS (
    SELECT p.id, 
     p.content, 
     p.parent_id, 
     1 AS post_level 
    FROM POSTS p 
    WHERE p.parent_id IS NULL 
    UNION ALL 
    SELECT p.id, 
     p.content, 
     p.parent_id, 
     ph.post_level + 1 AS post_level 
    FROM POSTS p 
    JOIN post_hierarchy ph ON ph.id = p.parent_id) 
SELECT ph.id, 
     ph.post_level, 
     c.upvotes, 
     c.body 
    FROM COMMENTS c 
    JOIN post_hierarchy ph ON ph.id = c.post_id 
ORDER BY ph.post_level, c.date

幾件事情需要注意的：

StackOverflow上顯示前5點意見，如果他們upvoted與否並不重要。立即顯示後續註釋，並立即顯示
如果不對每個帖子使用SELECT，則無法容納每個帖子5條評論的限制。添加TOP 5什麼我張貼只會返回基於ORDER前五排BY語句

來源

2009-12-16 22:59:15

我不會理會使用SQL（因爲我是一個SQL倡導者這可能會讓你大吃一驚）過濾的意見。只需將它們按CommentId排序，然後在應用程序代碼中進行篩選即可。

實際上很少有一個給定的帖子有超過五條評論，所以需要對它們進行過濾。在StackOverflow的10月份數據轉儲中，78％的帖子有0個或1個評論，97％的評論有5個或更少的評論。只有20個帖子有> = 50條評論，並且只有兩個帖子有超過100條評論。

因此，編寫複雜的SQL來做這種過濾會增加查詢所有帖子時的複雜度。在適當的時候，我都會使用聰明的SQL，但這將是一分錢一分貨和笨蛋。

你可以這樣來做：

SELECT q.PostId, a.PostId, c.CommentId 
FROM Posts q 
LEFT OUTER JOIN Posts a 
    ON (a.ParentId = q.PostId) 
LEFT OUTER JOIN Comments c 
    ON (c.PostId IN (q.PostId, a.PostId)) 
WHERE q.PostId = 1234 
ORDER BY q.PostId, a.PostId, c.CommentId;

但是這給你的q和a列，因爲這些列包括文字斑點是顯著的冗餘副本。將冗餘文本從RDBMS複製到應用程序的額外成本變得很大。

所以它可能更好而不是在兩個查詢中做到這一點。相反，考慮到客戶端瀏覽一個問題與帖子ID = 1234，請執行下列操作：

SELECT c.PostId, c.Text 
FROM Comments c 
JOIN (SELECT 1234 AS PostId UNION ALL 
    SELECT a.PostId FROM Posts a WHERE a.ParentId = 1234) p 
    ON (c.PostId = p.PostId);

通過他們

然後排序在應用程序代碼，被引用後加以收集並過濾掉超過五個最多餘的評論每個帖子有趣的。

我測試了這兩個查詢針對從10月份起使用StackOverflow的數據轉儲加載的MySQL 5.1數據庫。第一個查詢大約需要50秒。第二個查詢幾乎是瞬間的（在我爲Posts和Comments表預先緩存索引之後）。

底線是堅持使用單個SQL查詢獲取所需的所有數據是人爲需求（可能基於一種錯誤觀念，即對RDBMS發出查詢的往返行程必須儘量減少開銷不惜一切代價）。通常單個查詢是較少的高效解決方案。您是否嘗試將所有應用程序代碼寫入單一功能？ :-)

來源

2009-12-16 23:26:09

我同意你的觀點，我的實現實際上是一個輕微的優化，我會在posts表中存儲comment_count。在客戶端拉出所有帖子進行渲染，通過他們，然後做一個選擇*從其中post_id（id1，id2，id3） - 所有帖子超過0評論）的評論）這使得東西超簡單，非常高效的一般情況 – 2009-12-16 23:40:50

真正的問題不在於查詢，而在於模式，特別是聚簇索引。評論順序要求在你定義的時候是非常有用的（每個答案只有5個？）。我將這些要求解釋爲「每個帖子提取5條評論（回答或問題），優先考慮優先考慮的問題，然後考慮更新的問題。我知道這不是如何評論，但你必須更加謹慎地定義你的需求。

這裏是我的查詢：

declare @postId int; 
set @postId = ?; 

with cteQuestionAndReponses as (
    select post_id 
    from Posts 
    where post_id = @postId 
    union all 
    select post_id 
    from Posts 
    where parent_id = @postId) 
select * from 
cteQuestionAndReponses p 
outer apply (
    select count(*) as CommentsCount 
    from Comments c 
    where is_deleted = 0 
    and c.post_id = p.post_id) as cc 
outer apply (
    select top(5) * 
    from Comments c 
    where is_deleted = 0 
    and p.post_id = c.post_id 
    order by upvotes desc, date desc 
) as c

我有一些14K職位和我的測試表67K意見，查詢得到的職位在7毫秒：

Table 'Comments'. Scan count 12, logical reads 50, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Posts'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times: 
    CPU time = 0 ms, elapsed time = 7 ms.

這裏是我測試模式搭配：

create table Posts (
post_id int identity (1,1) not null 
, content varchar(max) not null 
, parent_id int null -- (null for questions, question_id for answer) 
, constraint fkPostsParent_id 
    foreign key (parent_id) 
    references Posts(post_id) 
, constraint pkPostsId primary key nonclustered (post_id) 
); 
create clustered index cdxPosts on 
    Posts(parent_id, post_id); 
go 

create table Comments (
comment_id int identity(1,1) not null 
, body varchar(max) not null 
, is_deleted bit not null default 0 
, post_id int not null 
, upvotes int not null default 0 
, date datetime not null default getutcdate() 
, constraint pkComments primary key nonclustered (comment_id) 
, constraint fkCommentsPostId 
    foreign key (post_id) 
    references Posts(post_id) 
); 
create clustered index cdxComments on 
    Comments (is_deleted, post_id, upvotes, date, comment_id); 
go

，這裏是我的測試數據生成：

insert into Posts (content) 
select 'Lorem Ipsum' 
from master..spt_values; 

insert into Posts (content, parent_id) 
select 'Ipsum Lorem', post_id 
from Posts p 
cross apply (
    select top(checksum(newid(), p.post_id) % 10) Number 
    from master..spt_values) as r 
where parent_id is NULL 

insert into Comments (body, is_deleted, post_id, upvotes, date) 
select 'Sit Amet' 
    -- 5% deleted comments 
    , case when abs(checksum(newid(), p.post_id, r.Number)) % 100 > 95 then 1 else 0 end 
    , p.post_id 
    -- up to 10 upvotes 
    , abs(checksum(newid(), p.post_id, r.Number)) % 10 
    -- up to 1 year old posts 
    , dateadd(minute, -abs(checksum(newid(), p.post_id, r.Number) % 525600), getutcdate()) 
from Posts p 
cross apply (
    select top(abs(checksum(newid(), p.post_id)) % 10) Number 
    from master..spt_values) as r

來源

2009-12-17 01:00:57

如何查詢註釋的stackoverflow樣式？

回答

相關問題