2015-07-21 138 views
0

我有一個使用EAV model生成屬性的CRM系統。由於您可能非常清楚EAV模型需要複雜的查詢來提取數據,因此這個問題非常明顯。每個屬性都必須在單獨的列中返回。如何提高MySQL中的子查詢性能

當使用子查詢時,MySQL的性能很糟糕。我必須通過使用give where子句,排序順序和限制「如果有的」來分析它們來找到更好的方式來編寫我的查詢!

通過子查詢我裁判的查詢看起來像這樣

SELECT a.account_name, a.account_type, a.status, a.account_id, s.fieldValue, s2.last_training_on, s3.fieldValue 
FROM accounts AS a 
INNER JOIN clients AS c ON c.client_id = a.client_id 
LEFT JOIN (
    SELECT p.related_to AS account_id, decimal_value AS fieldValue 
    FROM df_answers_text AS p 
    INNER JOIN df_field_to_client_relation AS r ON r.field_id = p.field_id 
    WHERE p.field_id = '19' AND r.client_id = '7'; 
) AS s ON s.account_id = a.account_id 
LEFT JOIN (
    SELECT p.related_to AS account_id, datetime_value AS last_training_on 
    FROM df_answers_text AS p 
    INNER JOIN df_field_to_client_relation AS r ON r.field_id = p.field_id 
    WHERE p.field_id = '10' AND r.client_id = '7'; 
) AS s2 ON s2.account_id = a.account_id 
LEFT JOIN (
    SELECT 
     p.related_to 
    , CAST(GROUP_CONCAT(o.label SEPARATOR " | ") AS CHAR(255)) AS fieldValue 
    FROM df_answer_predefined AS p 
    INNER JOIN df_fields_options AS o ON o.option_id = p.option_id 
    INNER JOIN df_field_to_client_relation AS r ON r.field_id = o.field_id 
    WHERE o.is_place_holder = 0 AND o.field_id = '16' AND r.field_id = '16' AND r.client_id = '7' 
    GROUP BY p.related_to; 
) AS s3 ON s3.related_to = a.account_id 
WHERE c.client_id = '7' AND c.status = 'Active' AND (a.account_type = 'TEST' OR a.account_type = 'VALUE' OR s2.last_training_on > '2015-01-01 00:00:00') AND (s.fieldValue = 'Medium' OR s.fieldValue = 'Low' OR a.expType = 'Very High') 
ORDER BY a.account_name 
LIMIT 500; 

我想和子查詢這樣

CREATE TEMPORARY TABLE s (KEY(account_id, fieldValue)) ENGINE = MEMORY 
SELECT p.related_to AS account_id, decimal_value AS fieldValue 
FROM df_answers_text AS p 
INNER JOIN df_field_to_client_relation AS r ON r.field_id = p.field_id 
WHERE p.field_id = '19' AND r.client_id = '7'; 

CREATE TEMPORARY TABLE s2 (KEY(account_id, INDEX USING BTREE last_training_on)) ENGINE = MEMORY 
SELECT p.related_to AS account_id, datetime_value AS last_training_on 
FROM df_answers_text AS p 
INNER JOIN df_field_to_client_relation AS r ON r.field_id = p.field_id 
WHERE p.field_id = '10' AND r.client_id = '7'; 


    CREATE TEMPORARY TABLE s3 (KEY(related_to, fieldValue)) ENGINE = MEMORY 
    SELECT 
     p.related_to 
    , CAST(GROUP_CONCAT(o.label SEPARATOR " | ") AS CHAR(255)) AS fieldValue 
    FROM df_answer_predefined AS p 
    INNER JOIN df_fields_options AS o ON o.option_id = p.option_id 
    INNER JOIN df_field_to_client_relation AS r ON r.field_id = o.field_id 
    WHERE o.is_place_holder = 0 AND o.field_id = '16' AND r.field_id = '16' AND r.client_id = '7' 
    GROUP BY p.related_to; 


    CREATE TEMPORARY TABLE s3 (KEY(related_to)) ENGINE = MEMORY 
    SELECT 
     p.related_to 
    , CAST(GROUP_CONCAT(o.label SEPARATOR " | ") AS CHAR(255)) AS fieldValue 
    FROM df_answer_predefined AS p 
    INNER JOIN df_fields_options AS o ON o.option_id = p.option_id 
    INNER JOIN df_field_to_client_relation AS r ON r.field_id = o.field_id 
    WHERE o.is_place_holder = 0 AND o.field_id = '16' AND r.field_id = '16' AND r.client_id = '7' 
    GROUP BY p.related_to; 


Then my new query will look like this 

    SELECT a.account_name, a.account_type, a.status, a.account_id, s.fieldValue, s2.last_training_on, s3.fieldValue 
    FROM accounts AS a 
    INNER JOIN clients AS c ON c.client_id = a.client_id 
    LEFT JOIN s ON s.account_id = a.account_id 
    LEFT JOIN s2 ON s2.account_id = a.account_id 
    LEFT JOIN s3 ON s2.related_to = a.account_id 
    WHERE c.client_id = '7' AND c.status = 'Active' AND (a.account_type = 'TEST' OR a.account_type = 'VALUE' OR s2.last_training_on > '2015-01-01 00:00:00') AND (s.fieldValue = 'Medium' OR s.fieldValue = 'Low' OR a.expType = 'Very High') 
    ORDER BY a.account_name 
    LIMIT 500; 

    DROP TEMPORARY TABLE s, s2; 

內容創建使用MEMORY引擎臨時表我現在面臨的問題是,臨時表將創建一個數據庫中可用的全部數據的臨時表,這會佔用大量時間。但我的外部查詢只查找按a.account_name排序的500條記錄。如果臨時表有100萬條記錄會浪費時間,顯然會給我帶來不好的表現。

我希望找到一個更好的方法來對子句傳遞給子查詢,以這種方式,我只會創建一個臨時表所需要的數據的外部查詢

注:這些查詢使用GUI生成動態。我無法弄清楚如何提取邏輯/子句並將它們正確地傳遞給子查詢。

質詢

  • 我如何看where子句,解析它們,並將它們傳遞給子查詢拒絕在子奎雷斯的數據量?如果這個句子叫「AND」,那麼我的生活會更容易,但因爲我有混合或「AND」和「OR」,這非常複雜。
  • 有沒有更好的方法來解決這個問題,而不是使用臨時表。

EDITED 這裏是我的表定義

CREATE TABLE df_answer_predefined ( answer_id int(11) unsigned NOT NULL AUTO_INCREMENT, field_id int(11) unsigned DEFAULT NULL, related_to int(11) unsigned DEFAULT NULL, option_id int(11) unsigned DEFAULT NULL, created_by int(11) unsigned NOT NULL, created_on datetime DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (answer_id), UNIQUE KEY un_row (field_id,option_id,related_to), KEY field_id (field_id), KEY related_to (related_to), KEY to_delete (field_id,related_to), KEY outter_view (field_id,option_id,related_to) ) ENGINE=InnoDB AUTO_INCREMENT=4946214 DEFAULT CHARSET=utf8;

`CREATE TABLE df_fields_options (
    option_id int(11) unsigned NOT NULL AUTO_INCREMENT, 
    field_id int(11) unsigned NOT NULL, 
    label varchar(255) DEFAULT NULL, 
    is_place_holder tinyint(1) NOT NULL DEFAULT '0', 
    is_default tinyint(1) NOT NULL DEFAULT '0', 
    sort smallint(3) NOT NULL DEFAULT '1', 
    status tinyint(1) NOT NULL DEFAULT '1', 
    PRIMARY KEY (option_id), 
    KEY i (field_id), 
    KEY d (option_id,field_id,is_place_holder) 
) ENGINE=InnoDB AUTO_INCREMENT=155 DEFAULT CHARSET=utf8;` 


`CREATE TABLE df_field_to_client_relation (
    relation_id int(11) unsigned NOT NULL AUTO_INCREMENT, 
    client_id int(11) unsigned DEFAULT NULL, 
    field_id int(11) unsigned DEFAULT NULL, 
    PRIMARY KEY (relation_id), 
    UNIQUE KEY unique_row (field_id,client_id), 
    KEY client_id (client_id), 
    KEY flient_id (field_id) 
) ENGINE=InnoDB AUTO_INCREMENT=26 DEFAULT CHARSET=utf8;` 


`CREATE TABLE df_answers_text (
    answer_id int(11) unsigned NOT NULL AUTO_INCREMENT, 
    notes varchar(20000) DEFAULT NULL, 
    datetime_value datetime DEFAULT NULL, 
    date_value date DEFAULT NULL, 
    us_phone_number char(10) DEFAULT NULL, 
    field_id int(11) unsigned DEFAULT NULL, 
    related_to int(11) unsigned DEFAULT NULL, 
    created_by int(11) unsigned NOT NULL, 
    created_on datetime DEFAULT CURRENT_TIMESTAMP, 
    modified_by int(11) DEFAULT NULL, 
    modified_on datetime DEFAULT NULL, 
    big_unsigned_value bigint(20) DEFAULT NULL, 
    big_signed_value bigint(19) DEFAULT NULL, 
    unsigned_value int(11) DEFAULT NULL, 
    signed_value int(10) DEFAULT NULL, 
    decimal_value decimal(18,4) DEFAULT NULL, 
    PRIMARY KEY (answer_id), 
    UNIQUE KEY unique_answer (field_id,related_to), 
    KEY field_id (field_id), 
    KEY related_to (related_to), 
    KEY big_unsigned_value (big_unsigned_value), 
    KEY big_signed_value (big_signed_value), 
    KEY unsigned_value (unsigned_value), 
    KEY signed_value (signed_value), 
    KEY decimal_Value (decimal_value) 
) ENGINE=InnoDB AUTO_INCREMENT=2458748 DEFAULT CHARSET=utf8;` 

這需要時間最多的查詢是第三次查詢與別名s3

這裏是我們花費很長時間「2秒」的查詢執行計劃

enter image description here

+1

我沒有提供任何實質性的幫助,但我的初步印象是,你可能從沒有在你的兩個子查詢潛在的局部交叉聯接看到一些好處。 – Uueerdo

+0

我有更多的問題比答案。哪些索引被定義?表格的相對大小是多少?你怎麼知道子查詢一直在使用?你有解釋計劃,我們可以看看嗎? – schtever

+0

我建議你提供正確的創建和插入語句和期望的結果。 – Strawberry

回答

0
UNIQUE(a,b,c) 
INDEX (a) 

刪除索引,因爲唯一密鑰是一個INDEX 的INDEX是唯一的前綴。

PRIMARY KEY(d) 
UNIQUE(a,b,c) 

爲什麼d呢?簡單地說,PRIMARY KEY(a,b,c)

FROM (SELECT ...) 
JOIN (SELECT ...) ON ... 

優化不佳(直到5.6.6)。只要有可能,將JOIN (SELECT)轉換爲與表格連接。如你所說,使用TMP表可以會更好,如果可以添加合適的索引到TMP表。最好是儘量避免超過一個「表」是一個子查詢。

在許多一對多的關係表,不包括在表中的ID,而不是有只有

PRIMARY KEY (a,b), -- for enforcing uniqueness, providing a PK, and going one direction 
INDEX  (b,a) -- for going the other way. 

的EXPLAIN似乎並不符合您所提供的選擇。沒有其他的,每個都是無用的。

另一種方法是威力幫助...而不是

SELECT ..., s2.foo, ... 
    ... 
    JOIN (SELECT ... FROM x WHERE ...) AS s2 ON s2.account_id = a.account_id 

看看你是否可以重新制定它:

SELECT ..., 
     (SELECT foo FROM x WHERE ... AND related = a.account_id) AS foo, ... 
    ... 

也就是說,相關子查詢用於替代JOIN子查詢你需要的一個價值。

底線是,EAV模型吸。

嗯...我不認爲這有必要在所有的,因爲r沒有在他查詢其他地方使用...

INNER JOIN df_field_to_client_relation AS r ON r.field_id = p.field_id 
    WHERE p.field_id = '19' AND r.client_id = '7' 

這似乎是等同於

WHERE EXISTS (SELECT * FROM df_field_to_client_relation 
       WHERE field_id = '19' AND client_id = '7') 

但爲什麼還要檢查是否存在?