2012-12-22 61 views
2

MySQL似乎無法使用GROUP BY子查詢來優化選擇,並且以較長的執行時間結束。對於這種常見的情況必須有一個已知的優化。與左右連接組合的MySQL子查詢 - 優化

假設我們試圖從數據庫返回所有訂單,並帶有一個標誌,指示它是否是客戶的第一筆訂單。

CREATE TABLE orders (order int, customer int, date date); 

檢索客戶的第一個訂單是超快。

SELECT customer, min(order) as first_order FROM orders GROUP BY customer; 

然而,一旦我們使用子查詢

SELECT order, first_order FROM orders LEFT JOIN ( 
    SELECT customer, min(order) as first_order FROM orders GROUP BY customer 
) AS first_orders ON orders.order=first_orders.first_order; 

我希望有我們缺少一個簡單的一招加入這個與全單組就變得很慢,因爲否則的話將約1000倍快做

CREATE TEMPORARY TABLE tmp_first_order AS 
    SELECT customer, min(order) as first_order FROM orders GROUP BY customer; 
CREATE INDEX tmp_boost ON tmp_first_order (first_order) 

SELECT order, first_order FROM orders LEFT JOIN tmp_first_order 
    ON orders.order=tmp_first_order.first_order; 

編輯
通過@ruakh啓發提出d選項3,使用INNER JOINUNION確實有一個不太難看的解決方法,它具有可接受的性能,但不需要臨時表。但是,這是有點特定於我們的情況,我想知道是否存在更通用的優化。

SELECT order, "YES" as first FROM orders INNER JOIN ( 
    SELECT min(order) as first_order FROM orders GROUP BY customer 
) AS first_orders_1 ON orders.order=first_orders_1.first_order 
UNION 
SELECT order, "NO" as first FROM orders INNER JOIN ( 
    SELECT customer, min(order) as first_order FROM orders GROUP BY customer 
) AS first_orders_2 ON first_orders_2.customer = orders.customer 
    AND orders.order > first_orders_2.first_order; 
+0

幾個思路:分析執行計劃(解釋查詢);指數;子查詢而不是左連接。 –

+0

克里斯托克斯,你檢查我的答案嗎? –

回答

3

這裏有一些事情你可以試試:

  1. 去除子查詢的字段列表customer,因爲它沒有做任何事情反正:

    SELECT order, 
         first_order 
        FROM orders 
        LEFT 
        JOIN (SELECT MIN(order) AS first_order 
          FROM orders 
          GROUP 
          BY customer 
         ) AS first_orders 
        ON orders.order = first_orders.first_order 
    ; 
    
  2. 相反,添加customerON條款,所以它實際上爲您做了一些事情:

    SELECT order, 
         first_order 
        FROM orders 
        LEFT 
        JOIN (SELECT customer, 
           MIN(order) AS first_order 
          FROM orders 
          GROUP 
          BY customer 
         ) AS first_orders 
        ON orders.customer = first_orders.customer 
        AND orders.order = first_orders.first_order 
    ; 
    
  3. 與以前相同,但使用INNER JOIN代替LEFT JOIN的,和你原來的ON條款轉換爲CASE表達:

    SELECT order, 
         CASE WHEN first_order = order THEN first_order END AS first_order 
        FROM orders 
    INNER 
        JOIN (SELECT customer, 
           MIN(order) AS first_order 
          FROM orders 
          GROUP 
          BY customer 
         ) AS first_orders 
        ON orders.customer = first_orders.customer 
    ; 
    
  4. 與不相關IN -subquery更換整個JOIN方法在CASE表達式中:

    SELECT order, 
         CASE WHEN order IN 
            (SELECT MIN(order) 
             FROM orders 
            GROUP 
             BY customer 
           ) 
          THEN order 
         END AS first_order 
        FROM orders 
    ; 
    
  5. 與相關EXISTS -subquery更換整個JOIN方法在CASE表達:

    SELECT order, 
         CASE WHEN NOT EXISTS 
            (SELECT 1 
             FROM orders AS o2 
            WHERE o2.customer = o1.customer 
             AND o2.order < o1.order 
           ) 
          THEN order 
         END AS first_order 
        FROM orders AS o1 
    ; 
    

(這很可能是上面的一些將實際執行糟糕,但我覺得他們都值得嘗試。)

+0

真棒答案... –

+0

好答案@ruakh。選項3很有趣,但在您的示例中,它只會返回第一個訂單。即如果您有100個客戶和2000個訂單,那麼這隻會返回100個第一個訂單。受到你的建議的啓發,我嘗試了一些似乎可行的'UNION'。 – kristox

+0

@kristox:Re:「如果你有100個客戶和2000個訂單,那麼[選項3]將只返回100個第一個訂單」:這不是事實。你確定你正確地複製了'ON'子句嗎? – ruakh

1

我希望使用一個變量,而不是離開時,這是更快的連接:

SELECT 
    `order`, 
    If(@previous_customer<>(@previous_customer:=`customer`), 
    `order`, 
    NULL 
) AS first_order 
FROM orders 
JOIN (SELECT @previous_customer := -1) x 
ORDER BY customer, `order`; 

這是我對SQL Fiddle回報什麼例子:

CUSTOMER ORDER FIRST_ORDER 
1   1  1 
1   2  (null) 
1   3  (null) 
2   4  4 
2   5  (null) 
3   6  6 
4   7  7 
+0

[MySQL參考手冊*的第9.4節](http://dev.mysql.com/doc/refman/5.6/en/user-variables.html)建議不要「爲用戶變量賦值」並在相同的語句中讀取該值「,理由是您無法保證它總是能夠提供您期望的結果(在更改MySQL版本,更改執行計劃等情況下)。 – ruakh