2017-07-19 70 views
0

我有一個包含3個表的數據庫。如何優化此查詢?需要3分鐘才能運行

  1. 具有一排2000-01-012040-01-01之間的每個日期共計14610行
  2. locations表,其具有用於每一個位置總共12個行
  3. 具有一個A receipts表中idnamecalendariddatetime,以及其他幾個不相關的總共約250,000行的字段

如果沒有收據,我試圖根據位置分組的日期範圍內每天獲得receipts的計數。

我有一個工作查詢,但它需要〜3分鐘運行:

SELECT 
    `locations`.`name` AS `location`, 
    `calendar`.`date` AS `date`, 
    COUNT(`receipts`.`id`) AS `count` 
FROM `locations` 
    CROSS JOIN `calendar` 
    LEFT JOIN `receipts` ON `calendar`.`date` = DATE(`receipts`.`datetime`) 
     AND `locations`.`id` = UPPER(LEFT(`receipts`.`id`, 1)) # there is no `location_id` FK. First char of receipts id is same as location id 
WHERE `calendar`.`date` >= '2017-04-01' AND `calendar`.`date` <= '2017-04-07' 
GROUP BY `locations`.`id`, `calendar`.`id` 
ORDER BY `locations`.`name` ASC, `calendar`.`date` ASC; 

我相信它有事情做與WHERE聲明。

我改變了WHERE這個,而不是它運行瞬間,但它不再讓我零計數無收款:

SELECT 
    `locations`.`name` AS `location`, 
    `calendar`.`date` AS `date`, 
    COUNT(`receipts`.`id`) AS `count` 
FROM `locations` 
    CROSS JOIN `calendar` 
    LEFT JOIN `receipts` ON `calendar`.`date` = DATE(`receipts`.`datetime`) 
     AND `locations`.`id` = UPPER(LEFT(`receipts`.`id`, 1)) # there is no `location_id` FK. First char of receipts id is same as location id 
WHERE DATE(`receipts`.`datetime`) >= '2017-04-01' AND DATE(`receipts`.`datetime`) <= '2017-04-07' 
GROUP BY `locations`.`id`, `calendar`.`id` 
ORDER BY `locations`.`name` ASC, `calendar`.`date` ASC; 

然後我開始與子查詢亂搞,但沒有成功:

SELECT 
    `locations`.`name` AS `location`, 
    `cal`.`date` AS `date`, 
    COUNT(`receipts`.`id`) AS `count` 
FROM `locations` 
    CROSS JOIN (
     SELECT `calendar`.`id`, `calendar`.`date` 
     FROM `calendar` 
     WHERE `calendar`.`date` >= '2017-04-01' AND `calendar`.`date` <= '2017-04-07' 
    ) `cal` 
    LEFT JOIN `receipts` ON `cal`.`date` = DATE(`receipts`.`datetime`) 
     AND `locations`.`id` = UPPER(LEFT(`receipts`.`id`, 1)) # there is no `location_id` FK. First char of receipts id is same as location id 
WHERE DATE(`receipts`.`datetime`) >= '2017-04-01' AND DATE(`receipts`.`datetime`) <= '2017-04-07' 
GROUP BY `locations`.`id`, `cal`.`id` 
ORDER BY `locations`.`name` ASC, `cal`.`date` ASC; 

無論如何,我可以加快第一個查詢,因爲這是給我我想要的輸出的那個?

+2

我認爲這裏的字符串比較是導致經濟放緩,你可以考慮增加一個外鍵? –

+1

你有適當的索引嗎?嘗試解釋你的select語句並檢查它是否使用索引。 – money

+1

並在問題中包含解釋的結果,所以我們可以看到。 – Shadow

回答

4

試試這個:

SELECT l.name location, c.date, COUNT(r.id) count 
FROM calendar c 
    left join calendar n on n.Date = c.Date + 1 -- one day after c.date 
    left join (locations l join receipts r 
       on r.id like '%' + l.Id) 
    on r.datetime between c.Date and n.Date 
where c.Date between '2017-04-01' and '2017-04-07' 
GROUP BY l.id, c.id 
ORDER BY l.name, c.date; 

你的問題的原因是:
1.You使用交叉聯接是不必要。交叉連接創建笛卡爾產品(一邊的每一行都與另一邊的每一行相結合)。因此,將字母與十位數字交叉連接將產生260行{A0,A1,A2 ... A9,B1, B2,.... B9 ...等}
2。在你的SQL查詢中有多個(儘管一個就足夠了)的事實會造成查詢處理器不得不從磁盤讀取表的每一行,從而有效地防止它使用可能在表上的任何索引。在過濾器(where子句)或排序(Order by子句)的表中的某列值的函數上使用此功能會執行此操作,因爲查詢處理器無法在不執行該函數的情況下知道函數值是什麼,並且它必須讀取從磁盤上的主表獲取行以獲取執行該函數的基礎值。如果它僅僅是原始列值,並且該列在索引中,那麼處理器不需要讀取主數據表,它可以僅遍歷索引,索引通常是相當小的尺寸並且需要更小的數目的磁盤IO。

這被稱爲SARGable

如果c.Date + 1是不可能在MySQL,那麼試試這個:

SELECT l.name location, c.date, COUNT(r.id) count 
FROM calendar c 
    left join calendar n on n.Date = 
     (Select min(date) from Calendar -- subquery gets the next day in calendar 
     Where date > c.Date)   
    left join (locations l join receipts r 
       on r.id like '%' + l.Id) 
    on r.datetime between c.Date and n.Date 
where c.Date between '2017-04-01' and '2017-04-07' 
GROUP BY l.id, c.id 
ORDER BY l.name, c.date; 
+0

感謝您的幫助。我嘗試了您發佈的第一個查詢,但仍需要很長時間才能運行。但是,我最終自己解決了這個問題(請參閱我的答案)。我不得不將where語句移到解決我的問題的連接子查詢中。我懷疑這是因爲它不再試圖在已經很大的日曆日期表上加入這麼多行。限制結果集加入解決了問題。我對此一無所知,所以確定這就是爲什麼,但嘿它有效! –

0
SELECT 
    `locations`.`name` AS `location`, 
    `calendar`.`date` AS `date`, 
    COUNT(`receipts`.`id`) AS `count` 
FROM `locations` 
    CROSS JOIN `calendar` 
    LEFT JOIN `receipts` ON `calendar`.`date` = DATE(`receipts`.`datetime`) 
     AND `locations`.`id` = UPPER(LEFT(`receipts`.`id`, 1)) # there is no `location_id` FK. First char of receipts id is same as location id 
WHERE `calendar`.`date` BETWEEN '2017-04-01' AND '2017-04-07' 
GROUP BY `locations`.`id`, `calendar`.`id` 
ORDER BY `locations`.`name` ASC, `calendar`.`date` ASC; 

試試上面的查詢。

在這裏,我用BETWEEN而不是<>

你也可以在calendar.date這個字段上創建索引。

您可以在子表上添加FOREIGN KEY約束,並在該列上進行連接。在這種情況下,INDEX也會有所幫助。

+0

'BETWEEN'沒有加快速度。然而,我最終自己解決了它(見答案)。不過謝謝。 –

+1

'BETWEEN'總是相當於'<=' and '> ='的表現。 (並不等同於'<' and '>'在功能上 - 包括在內。) –

+0

@RickJames明白了man .. !! –

0

對不起,我浪費了每個人的時間,但我設法自己解決這個問題。

這裏是我想出查詢它運行瞬間:

SELECT 
    `l`.`name` AS `location`, 
    `c`.`date` AS `date`, 
    COUNT(`r`.`id`) AS `count` 
FROM `locations` AS `l` 
    CROSS JOIN (
     SELECT `calendar`.`id`, `calendar`.`date` 
     FROM `calendar` 
     WHERE `calendar`.`date` >= '2017-04-01' AND `calendar`.`date` <= '2017-04-07' 
    ) `c` 
    LEFT JOIN (
     SELECT `receipts`.`id`, `receipts`.`datetime` 
     FROM `receipts` 
     WHERE DATE(`receipts`.`datetime`) >= '2017-04-01' AND DATE(`receipts`.`datetime`) <= '2017-04-07' 
    ) `r` ON `c`.`date` = DATE(`r`.`datetime`) AND `l`.`id` = UPPER(LEFT(`r`.`id`, 1)) 
GROUP BY `l`.`id`, `c`.`id` 
ORDER BY `l`.`name` ASC, `c`.`date` ASC;