2010-10-11 145 views
4

這篇文章需要花費大量的時間來打印,因爲我想盡可能清楚,所以如果它仍然不清楚,請耐心等待。MySQL包含/排除帖子

基本上,我有一個數據庫中的職位表,用戶可以添加隱私設置。

ID | owner_id | post | other_info | privacy_level (int value) 

從那裏,用戶可以添加自己的隱私信息,允許它是由所有[PRIVACY_LEVEL = 0),朋友(PRIVACY_LEVEL = 1),沒有人(PRIVACY_LEVEL = 3),或特定的人或過濾器查看(privacy_level = 4)。對於指定特定人員的隱私級別(4),查詢將在子查詢中引用表「post_privacy_includes_for」,以查看錶中是否存在用戶(或用戶所屬的過濾器)。

ID | post_id | user_id | list_id 

此外,用戶必須防止一些人通過排除他們一個更大的羣體中查看他們的帖子中(例如,具有其設置給大家觀看,但是從死纏爛打用戶隱藏它)的能力。爲此,添加了另一個參考表「post_privacy_exclude_from」 - 它看起來與「post_privacy_includes_for」的設置完全相同。

我的問題是,這並沒有規模。完全一樣。目前,大約有一百二十萬個帖子,大多數帖子都可以被所有人查看。對於頁面上的每個帖子,它都必須檢查是否有一行不包括向用戶顯示的帖子 - 這在移動速度非常緩慢的頁面上可以填充100-200個帖子。這可能需要2-4秒,特別是在查詢中添加了其他約束時。

這也創造了非常大而複雜的查詢,只是...尷尬。

SELECT t.* 
FROM posts t 
WHERE ((t.privacy_level = 3 
     AND t.owner_id = ?) 
     OR (t.privacy_level = 4 
      AND EXISTS 
      (SELECT i.id 
       FROM PostPrivacyIncludeFor i 
       WHERE i.user_id = ? 
       AND i.thought_id = t.id) 
      OR t.privacy_level = 4 
      AND t.owner_id = ?) 
     OR (t.privacy_level = 4 
      AND EXISTS 
      (SELECT i2.id 
       FROM PostPrivacyIncludeFor i2 
       WHERE i2.thought_id = t.id 
       AND EXISTS 
        (SELECT r.id 
        FROM FriendFilterIds r 
        WHERE r.list_id = i2.list_id 
        AND r.friend_id = ?)) 
      OR t.privacy_level = 4 
      AND t.owner_id = ?) 
     OR (t.privacy_level = 1 
      AND EXISTS 
      (SELECT G.id 
       FROM Following G 
       WHERE follower_id = t.owner_id 
       AND following_id = ? 
       AND friend = 1) 
      OR t.privacy_level = 1 
      AND t.owner_id = ?) 
     OR (NOT EXISTS 
      (SELECT e.id 
       FROM PostPrivacyExcludeFrom e 
       WHERE e.thought_id = t.id 
       AND e.user_id = ? 
       AND NOT EXISTS 
        (SELECT e2.id 
        FROM PostPrivacyExcludeFrom e2 
        WHERE e2.thought_id = t.id 
        AND EXISTS 
         (SELECT l.id 
         FROM FriendFilterIds l 
         WHERE l.list_id = e2.list_id 
          AND l.friend_id = ?))) 
      AND t.privacy_level IN (0, 1, 4)) 
    AND t.owner_id = ? 
ORDER BY t.created_at LIMIT 100 

(小樣查詢,類似於現在我在學說ORM使用的查詢。這是一個爛攤子,但你得到我說的話。)

我想我的問題是,你會如何處理這種情況來優化呢?有沒有更好的方法來設置我的數據庫?我願意徹底廢除我目前建立的方法,但我不知道要採取什麼行動。

謝謝你們。

更新:修復查詢以反映我的隱私級別定義上述值(我忘了更新,因爲我簡化了值)

+0

您可能應該在查詢中添加一些換行符和縮進符號,它現在很難理解。 – 2010-10-11 05:45:25

+0

privacy_level = 7是什麼意思? – Martin 2010-10-11 06:53:41

+0

對不起,我更新了查詢以反映示例中的值(在實際應用中,隱私值不同) – Dandy 2010-10-11 15:25:50

回答

1

你的查詢過長給一個明確的解決方案,但該方法我遵循的是簡單地通過轉換子查詢到連接中的數據查找,然後構建邏輯到select語句的where子句和列列表:

select t.*, i.*, r.*, G.*, e.* from posts t 
left join PostPrivacyIncludeFor i on i.user_id = ? and i.thought_id = t.id 
left join FriendFilterIds r on r.list_id = i.list_id and r.friend_id = ? 
left join Following G on follower_id = t.owner_id and G.following_id = ? and G.friend=1 
left join PostPrivacyExcludeFrom e on e.thought_id = t.id and e.user_id = ? 

(這可能需要擴大:我不能不遵循最後條款的邏輯。)

如果你可以得到簡單的選擇工作快速和包括所有需要的信息,那麼你所需要做的就是在選擇列表和where子句中建立邏輯。

0

有一個快速刺激簡化這個沒有重新工作你的原始設計太多。

使用此解決方案,您的網頁現在可以簡單地調用以下存儲過程來獲取給定用戶在指定時間段內過濾的帖子列表。

call list_user_filtered_posts(<user_id>, <day_interval>); 

整個腳本可以在這裏找到:http://pastie.org/1212812

我還沒有完全測試的這一切,你會發現這個解決方案是不是你需要有足夠的高性能,但它可以幫助你在精調整/修改您現有的設計。

丟棄你post_privacy_exclude_from表,並增加了一個user_stalkers表,它的工作原理很像user_friends的倒數。根據您的設計保留原始的post_privacy_includes_for表格,因爲這允許用戶將特定帖子限制爲一部分人員。

drop table if exists users; 
create table users 
(
user_id int unsigned not null auto_increment primary key, 
username varbinary(32) unique not null 
) 
engine=innodb; 


drop table if exists user_friends; 
create table user_friends 
(
user_id int unsigned not null, 
friend_user_id int unsigned not null, 
primary key (user_id, friend_user_id) 
) 
engine=innodb; 


drop table if exists user_stalkers; 
create table user_stalkers 
(
user_id int unsigned not null, 
stalker_user_id int unsigned not null, 
primary key (user_id, stalker_user_id) 
) 
engine=innodb; 


drop table if exists posts; 
create table posts 
(
post_id int unsigned not null auto_increment primary key, 
user_id int unsigned not null, 
privacy_level tinyint unsigned not null default 0, 
post_date datetime not null, 
key user_idx(user_id), 
key post_date_user_idx(post_date, user_id) 
) 
engine=innodb; 


drop table if exists post_privacy_includes_for; 
create table post_privacy_includes_for 
(
post_id int unsigned not null, 
user_id int unsigned not null, 
primary key (post_id, user_id) 
) 
engine=innodb; 

存儲過程

存儲過程相對比較簡單 - 它最初選擇在指定期間內的所有帖子,然後過濾掉帖子按你原來的要求。我沒有性能測試這個大容量的sproc,但由於最初的選擇相對較小,它應該足夠高性能,並且可以簡化應用程序/中間層代碼。

drop procedure if exists list_user_filtered_posts; 

delimiter # 

create procedure list_user_filtered_posts 
(
in p_user_id int unsigned, 
in p_day_interval tinyint unsigned 
) 
proc_main:begin 

drop temporary table if exists tmp_posts; 
drop temporary table if exists tmp_priv_posts; 

-- select ALL posts in the required date range (or whatever selection criteria you require) 

create temporary table tmp_posts engine=memory 
select 
    p.post_id, p.user_id, p.privacy_level, 0 as deleted 
from 
    posts p 
where 
    p.post_date between now() - interval p_day_interval day and now() 
order by 
    p.user_id; 

-- purge stalker posts (0,1,3,4) 

update tmp_posts 
inner join user_stalkers us on us.user_id = tmp_posts.user_id and us.stalker_user_id = p_user_id 
set 
    tmp_posts.deleted = 1 
where 
    tmp_posts.user_id != p_user_id; 

-- purge other users private posts (3) 

update tmp_posts set deleted = 1 where user_id != p_user_id and privacy_level = 3; 

-- purge friend only posts (1) i.e where p_user_id is not a friend of the poster 

/* 
    requires another temp table due to mysql temp table problem/bug 
    http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html 
*/ 

-- the private posts (1) this user can see 

create temporary table tmp_priv_posts engine=memory 
select 
    tp.post_id 
from 
    tmp_posts tp 
inner join user_friends uf on uf.user_id = tp.user_id and uf.friend_user_id = p_user_id 
where 
    tp.user_id != p_user_id and tp.privacy_level = 1; 

-- remove private posts this user cant see 

update tmp_posts 
left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id 
set 
    tmp_posts.deleted = 1 
where 
    tpp.post_id is null and tmp_posts.privacy_level = 1; 

-- purge filtered (4) 

truncate table tmp_priv_posts; -- reuse tmp table 

insert into tmp_priv_posts 
select 
    tp.post_id 
from 
    tmp_posts tp 
inner join post_privacy_includes_for ppif on tp.post_id = ppif.post_id and ppif.user_id = p_user_id 
where 
    tp.user_id != p_user_id and tp.privacy_level = 4; 

-- remove private posts this user cant see 

update tmp_posts 
left outer join tmp_priv_posts tpp on tmp_posts.post_id = tpp.post_id 
set 
    tmp_posts.deleted = 1 
where 
    tpp.post_id is null and tmp_posts.privacy_level = 4; 

drop temporary table if exists tmp_priv_posts; 

-- output filtered posts (display ALL of these on web page) 

select 
    p.* 
from 
    posts p 
inner join tmp_posts tp on p.post_id = tp.post_id 
where 
    tp.deleted = 0 
order by 
    p.post_id desc; 

-- clean up 

drop temporary table if exists tmp_posts; 

end proc_main # 

delimiter ; 

測試數據

一些基本的測試數據。

insert into users (username) values ('f00'),('bar'),('alpha'),('beta'),('gamma'),('omega'); 

insert into user_friends values 
(1,2),(1,3),(1,5), 
(2,1),(2,3),(2,4), 
(3,1),(3,2), 
(4,5), 
(5,1),(5,4); 

insert into user_stalkers values (4,1); 

insert into posts (user_id, privacy_level, post_date) values 

-- public (0) 

(1,0,now() - interval 8 day), 
(1,0,now() - interval 8 day), 
(2,0,now() - interval 7 day), 
(2,0,now() - interval 7 day), 
(3,0,now() - interval 6 day), 
(4,0,now() - interval 6 day), 
(5,0,now() - interval 5 day), 

-- friends only (1) 

(1,1,now() - interval 5 day), 
(2,1,now() - interval 4 day), 
(4,1,now() - interval 4 day), 
(5,1,now() - interval 3 day), 

-- private (3) 

(1,3,now() - interval 3 day), 
(2,3,now() - interval 2 day), 
(4,3,now() - interval 2 day), 

-- filtered (4) 

(1,4,now() - interval 1 day), 
(4,4,now() - interval 1 day), 
(5,4,now()); 

insert into post_privacy_includes_for values (15,4), (16,1), (17,6); 

測試

正如我之前提到的,我不完全測試這一點,但在表面上似乎是工作。

select * from posts; 

call list_user_filtered_posts(1,14); 
call list_user_filtered_posts(6,14); 

call list_user_filtered_posts(1,7); 
call list_user_filtered_posts(6,7); 

希望你找到一些這種用法。