2009-10-13 90 views
1

目標:根據用戶的選擇推薦對象MySQL的:暗示對象(優化的多連接查詢)

數據:表包含在用戶將如何從最壞到訂單對象的子集信息最好;例如:

  1 2 3 4 5 6 
    John: A B G J S O 
    Mary: A C G L 
    Joan: B C L J K 
    Stan: G J C L 

用戶數大約是對象的20倍,每個用戶的陣容包含50-200個對象。

表:

CREATE TABLE IF NOT EXISTS `pref` (
    `usr` int(10) unsigned NOT NULL, 
    `obj` int(10) unsigned NOT NULL, 
    `ord` int(10) unsigned NOT NULL, 
    UNIQUE KEY `u_o` (`usr`,`obj`), 
    KEY `u` (`usr`) 
) ENGINE=MyISAM DEFAULT CHARSET=utf8; 

基本思想:從第二最壞開始用戶的對象內迭代,構建雙(A> B);根據這些用戶,在其他用戶的陣容中查找它們並列出比A更好的項目。

查詢:

SELECT e.obj, COUNT(e.obj) AS rate 
FROM pref a, pref b, pref c, pref d, pref e 

WHERE a.usr = '222' # step 1: select a pair of objects A, B, where A is better than B according to user X 
AND a.obj = '111' 
AND b.usr = a.usr 
AND b.ord < a.ord 

AND c.obj = a.obj # step 2: find users thinking that object A is better than B 
AND d.obj = b.obj 
AND d.ord < c.ord 
AND d.usr = c.usr 

AND e.ord > c.ord # step 3: find objects better than A according to these users 
AND e.usr = c.usr 

GROUP BY e.obj 
ORDER BY rate DESC; 

別名:
a對象A( '111'),當前用戶( '222')
b對象B,根據用戶當前的比A更壞(有'ord'的值比A低)
c對象A在其他用戶的陣容中
d對象B在其他用戶的陣容中
在其他用戶的陣容

執行計劃(OUO和UO作爲指標由Quassnoi的建議)比Ae對象更好:

+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+ 
| id | select_type | table | type | possible_keys | key | key_len | ref     | rows | Extra          | 
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+ 
| 1 | SIMPLE  | a  | ref | ouo,uo  | ouo | 8  | const,const   | 1 | Using index; Using temporary; Using filesort | 
| 1 | SIMPLE  | b  | ref | ouo,uo  | uo | 4  | const    | 86 | Using where         | 
| 1 | SIMPLE  | d  | ref | ouo,uo  | ouo | 4  | db.b.obj   | 587 | Using index         | 
| 1 | SIMPLE  | c  | ref | ouo,uo  | ouo | 8  | const,db.d.usr  | 1 | Using where; Using index      | 
| 1 | SIMPLE  | e  | ref | uo   | uo | 4  | db.d.usr   | 80 | Using where         | 
+----+-------------+-------+------+---------------+------+---------+---------------------+------+----------------------------------------------+ 

查詢,似乎只要精細工作的數據集不太大;關於如何簡化它以支持更大數據集的想法?

+0

每個用戶平均會有多少個對象? – Quassnoi 2009-10-13 15:14:00

+0

每個用戶大約有50-200個對象。 – Mike 2009-10-13 15:56:03

+0

你真的需要這個排名嗎?如果不是那個排名,查詢可以很容易地改進。另外,你能否發佈查詢的執行計劃? – Quassnoi 2009-10-13 19:42:06

回答

3

查詢是好的,只是創建以下指標:

pref (obj, usr, ord) 
pref (usr, ord) 

更新:

試試這個語法。

評分系統更簡單但非常相似:它對我創建的測試隨機結果給出幾乎相同的評分。

SELECT oa.obj, SUM(weight) AS rate 
FROM (
     SELECT usr, ord, 
       (
       SELECT COUNT(*) 
       FROM pref a 
       JOIN pref ob 
       ON  ob.obj = a.obj 
       WHERE ob.usr = o.usr 
         AND a.usr = 50 
         AND a.ord < 
         (
         SELECT ord 
         FROM pref ai 
         WHERE ai.usr = 50 
           AND ai.obj = 75 
         ) 
         AND ob.ord < o.ord 
       ) AS weight 
     FROM pref o 
     WHERE o.obj = 75 
     HAVING weight >= 0 
     ) ow 
JOIN pref oa 
ON  oa.usr = ow.usr 
     AND oa.ord > ow.ord 
GROUP BY 
     oa.obj 
ORDER BY 
     rate DESC 

該查詢給出的重量額定高於A由誰額定A所有用戶的每一個項目。

重量等於兩位用戶評分低於A的商品數量。

+0

謝謝!這太糟糕了。對於具有250k行的表中的150個對象的用戶,需要大約8-9秒(返回376行)。 :/ 我試着將表類型從MyISAM更改爲MEMORY,但由於某些原因,它不想使用多列索引。奇怪的。 – Mike 2009-10-13 18:38:39