2012-04-10 110 views
1

我們有一個簡單的通用表格結構,在PostgreSQL中實現(8.3; 9.1在我們的視野)。這似乎是一個非常直接和普遍的實現。它歸結爲:查詢記錄通過鍵值對鏈接到實際上與條件匹配的記錄

events_event_types 
(
    # this table holds some 50 rows 
    id bigserial # PK 
    "name" character varying(255) 
) 

events_events 
(
    # this table holds some 15M rows 
    id bigserial # PK 
    datetime timestamp with time zone 
    eventtype_id bigint # FK to events_event_types.id 
) 

CREATE TABLE events_eventdetails 
(
    # this table holds some 65M rows 
    id bigserial # PK 
    keyname character varying(255) 
    "value" text 
    event_id bigint # FK to events_events.id 
) 

一些events_events行和events_eventdetails表會是這樣的:

events_events    | events_eventdetails 
    id datetime eventtype_id | id keyname  value  event_id 
----------------------------|------------------------------------------- 
    100 ...  10   | 1000 transactionId 9774ae16-... 100 
          | 1001 someKey  some value 100 
    200 ...  20   | 2000 transactionId 9774ae16-... 200 
          | 2001 reductionId 123   200 
          | 2002 reductionId 456   200 
    300 ...  30   | 3000 transactionId 9774ae16-... 300 
          | 2001 customerId 234   300 
          | 2001 companyId  345   300 

我們正處在一個「解決方案」,它返回events_events行100和迫切需要200和300一起在一個結果集和FAST!當詢問reductionId = 123時,或者當詢問customerId = 234或詢問companyId = 345時。 (可能對這些標準的AND組合有興趣,但這不是目標。) 不確定此時是否重要,但結果集應該可以在日期時間範圍和eventtype_id(IN列表)上過濾並獲得LIMIT 。

我問了一個「解決方案」,因爲這可能是兩種:

  • 單個查詢
  • 兩個較小的查詢(只要它們的中間結果總是足夠小,我採取了這一做法,並被困的公司(companyId)與大量關聯交易(〜20K)(的transactionId))
  • 一個微妙的重新設計(如非規範化)

這不是一個新鮮的疑問句因爲我們在幾個月內嘗試了所有三種方法(不會因爲這些查詢而煩惱你),但它在表現上都失敗了。該解決方案應返回< < < 1秒。先前的嘗試花費了大約。最好是10秒。

我真的很感激一些幫助 - 我在現在的損失......


兩個較小的查詢方法看起來就像這樣:

查詢1:

SELECT Substring(details2_transvalue.VALUE, 0, 32) 
    FROM events_eventdetails details2_transvalue 
    JOIN events_eventdetails compdetails ON details2_transvalue.event_id = compdetails.event_id 
    AND compdetails.keyname = 'companyId' 
    AND Substring(compdetails.VALUE, 0, 32) = '4' 
    AND details2_transvalue.keyname = 'transactionId' 

問題2:

SELECT events1.* 
    FROM events_events events1 
    JOIN events_eventdetails compDetails ON events1.id = compDetails.event_id 
    AND compDetails.keyname='companyId' 
    AND substring(compDetails.value,0,32)='4' 
    WHERE events1.eventtype_id IN (...) 
UNION 
SELECT events2.* 
    FROM events_events events2 
    JOIN events_eventdetails details2_transKey ON events2.id = details2_transKey.event_id 
    AND details2_transKey.keyname='transactionId' 
    AND substring(details2_transKey.value,0,32) IN (-- result of query 1 goes here --) 
    WHERE events2.eventtype_id IN (...) 
    ORDER BY dateTime DESC LIMIT 50 

由於查詢1返回大集,因此性能變差。

正如您所看到的,events_eventdetails表中的值始終表示爲長度爲32的子字符串,我們已將它們編入索引。 keyname,event_id,event_id + keyname,keyname + length 32 substring的更多索引。


這是一個PostgreSQL 9。1種方法 - 儘管我沒有正式的該平臺在我手上:

WITH companyevents AS (
SELECT events1.* 
FROM events_events events1 
JOIN events_eventdetails compDetails 
ON events1.id = compDetails.event_id 
AND compDetails.keyname='companyId' 
AND substring(compDetails.value,0,32)=' -- my desired companyId -- ' 
WHERE events1.eventtype_id in (...) 
ORDER BY dateTime DESC 
LIMIT 50 
) 
SELECT * from events_events 
WHERE transaction_id IN (SELECT transaction_id FROM companyevents) 
OR id IN (SELECT id FROM companyevents) 
AND eventtype_id IN (...) 
ORDER BY dateTime DESC 
LIMIT 250; 

的查詢計劃是與28228個transactionIds爲companyId如下:

Limit (cost=7545.99..7664.33 rows=250 width=130) (actual time=210.100..3026.267 rows=50 loops=1) 
    CTE companyevents 
    -> Limit (cost=7543.62..7543.74 rows=50 width=130) (actual time=206.994..207.020 rows=50 loops=1) 
      -> Sort (cost=7543.62..7544.69 rows=429 width=130) (actual time=206.993..207.005 rows=50 loops=1) 
       Sort Key: events1.datetime 
       Sort Method: top-N heapsort Memory: 23kB 
       -> Nested Loop (cost=10.02..7529.37 rows=429 width=130) (actual time=0.093..178.719 rows=28228 loops=1) 
         -> Append (cost=10.02..1140.62 rows=657 width=8) (actual time=0.082..27.594 rows=28228 loops=1) 
          -> Bitmap Heap Scan on events_eventdetails compdetails (cost=10.02..394.47 rows=97 width=8) (actual time=0.021..0.021 rows=0 loops=1) 
            Recheck Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '4'::text)) 
            -> Bitmap Index Scan on events_eventdetails_substring_ind (cost=0.00..10.00 rows=97 width=0) (actual time=0.019..0.019 rows=0 loops=1) 
             Index Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '4'::text)) 
          -> Index Scan using events_eventdetails_companyid_substring_ind on events_eventdetails_companyid compdetails (cost=0.00..746.15 rows=560 width=8) (actual time=0.061..18.655 rows=28228 loops=1) 
            Index Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '4'::text)) 
         -> Index Scan using events_events_pkey on events_events events1 (cost=0.00..9.71 rows=1 width=130) (actual time=0.004..0.004 rows=1 loops=28228) 
          Index Cond: (id = compdetails.event_id) 
          Filter: (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[])) 
    -> Index Scan Backward using events_events_datetime_ind on events_events (cost=2.25..1337132.75 rows=2824764 width=130) (actual time=210.100..3026.255 rows=50 loops=1) 
     Filter: ((hashed SubPlan 2) OR ((hashed SubPlan 3) AND (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[])))) 
     SubPlan 2 
      -> CTE Scan on companyevents (cost=0.00..1.00 rows=50 width=90) (actual time=206.998..207.071 rows=50 loops=1) 
     SubPlan 3 
      -> CTE Scan on companyevents (cost=0.00..1.00 rows=50 width=8) (actual time=0.001..0.026 rows=50 loops=1) 
Total runtime: 3026.410 ms 

查詢計劃如下:對於companyId有288個transactionIds:

Limit (cost=7545.99..7664.33 rows=250 width=130) (actual time=30.976..3790.362 rows=54 loops=1) 
    CTE companyevents 
    -> Limit (cost=7543.62..7543.74 rows=50 width=130) (actual time=9.263..9.290 rows=50 loops=1) 
      -> Sort (cost=7543.62..7544.69 rows=429 width=130) (actual time=9.263..9.272 rows=50 loops=1) 
       Sort Key: events1.datetime 
       Sort Method: top-N heapsort Memory: 24kB 
       -> Nested Loop (cost=10.02..7529.37 rows=429 width=130) (actual time=0.071..8.195 rows=1025 loops=1) 
         -> Append (cost=10.02..1140.62 rows=657 width=8) (actual time=0.060..1.348 rows=1025 loops=1) 
          -> Bitmap Heap Scan on events_eventdetails compdetails (cost=10.02..394.47 rows=97 width=8) (actual time=0.021..0.021 rows=0 loops=1) 
            Recheck Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '5'::text)) 
            -> Bitmap Index Scan on events_eventdetails_substring_ind (cost=0.00..10.00 rows=97 width=0) (actual time=0.019..0.019 rows=0 loops=1) 
             Index Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '5'::text)) 
          -> Index Scan using events_eventdetails_companyid_substring_ind on events_eventdetails_companyid compdetails (cost=0.00..746.15 rows=560 width=8) (actual time=0.039..1.006 rows=1025 loops=1) 
            Index Cond: (((keyname)::text = 'companyId'::text) AND ("substring"(value, 0, 32) = '5'::text)) 
         -> Index Scan using events_events_pkey on events_events events1 (cost=0.00..9.71 rows=1 width=130) (actual time=0.005..0.006 rows=1 loops=1025) 
          Index Cond: (id = compdetails.event_id) 
          Filter: (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[])) 
    -> Index Scan Backward using events_events_datetime_ind on events_events (cost=2.25..1337132.75 rows=2824764 width=130) (actual time=30.975..3790.332 rows=54 loops=1) 
     Filter: ((hashed SubPlan 2) OR ((hashed SubPlan 3) AND (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[])))) 
     SubPlan 2 
      -> CTE Scan on companyevents (cost=0.00..1.00 rows=50 width=90) (actual time=9.266..9.327 rows=50 loops=1) 
     SubPlan 3 
      -> CTE Scan on companyevents (cost=0.00..1.00 rows=50 width=8) (actual time=0.001..0.019 rows=50 loops=1) 
Total runtime: 3796.736 ms 

隨着3S/4S,這不是壞的,但仍然是一個因素100+太慢。另外,這不是在相關的硬件上。儘管如此,它應該顯示疼痛在哪裏。


這裏是有可能有可能成長爲一個解決方案:

新增的表:

events_transaction_helper 
(
    event_id bigint not null 
    transactionid character varying(36) not null 
    keyname character varying(255) not null 
    value bigint not null 
    # index on keyname, value 
) 

我「手動」現在充滿此表,但物化視圖實現會做招。這將多少按照下面的查詢:

SELECT tr.event_id, tr.value AS transactionid, det.keyname, det.value AS value 
    FROM events_eventdetails tr 
    JOIN events_eventdetails det ON det.event_id = tr.event_id 
    WHERE tr.keyname = 'transactionId' 
    AND det.keyname 
    IN ('companyId', 'reduction_id', 'customer_id'); 

添加了一個列到events_events表:

transaction_id character varying(36) null 

這種新列充滿如下:

update events_events 
set transaction_id = 
    (select value from events_eventdetails 
     where keyname='transactionId' 
     and event_id=events_events.id); 

現在,下面的查詢返回< 15ms始終如一:

explain analyze select * from events_events 
    where transactionId in 
    (select distinct transactionid 
     from events_transaction_helper 
     WHERE keyname='companyId' and value=5) 
    and eventtype_id in (...) 
    order by datetime desc limit 250; 

Limit (cost=5075.23..5075.85 rows=250 width=130) (actual time=8.901..9.028 rows=250 loops=1) 
    -> Sort (cost=5075.23..5077.19 rows=785 width=130) (actual time=8.900..8.953 rows=250 loops=1) 
     Sort Key: events_events.datetime 
     Sort Method: top-N heapsort Memory: 81kB 
     -> Nested Loop (cost=57.95..5040.04 rows=785 width=130) (actual time=0.928..8.268 rows=524 loops=1) 
       -> HashAggregate (cost=52.30..52.42 rows=12 width=37) (actual time=0.895..0.991 rows=276 loops=1) 
        -> Subquery Scan on "ANY_subquery" (cost=52.03..52.27 rows=12 width=37) (actual time=0.558..0.757 rows=276 loops=1) 
          -> HashAggregate (cost=52.03..52.15 rows=12 width=37) (actual time=0.556..0.638 rows=276 loops=1) 
           -> Index Scan using testmaterializedviewkeynamevalue on events_transaction_helper (cost=0.00..51.98 rows=22 width=37) (actual time=0.068..0.404 rows=288 loops=1) 
             Index Cond: (((keyname)::text = 'companyId'::text) AND (value = 5)) 
       -> Bitmap Heap Scan on events_events (cost=5.65..414.38 rows=100 width=130) (actual time=0.023..0.024 rows=2 loops=276) 
        Recheck Cond: ((transactionid)::text = ("ANY_subquery".transactionid)::text) 
        Filter: (eventtype_id = ANY ('{103,106,107,110,45,34,14,87,58,78,7,76,42,11,25,57,98,37,30,35,33,49,52,29,74,28,85,59,51,65,66,18,13,86,75,6,44,38,43,94,56,95,96,71,50,81,90,89,16,17,4,88,79,77,68,97,92,67,72,53,2,10,31,32,80,111,104,93,26,8,61,5,73,70,63,20,60,40,41,23,22,48,36,108,99,64,62,55,69,19,46,47,15,54,100,101,27,21,12,102,105,109,112,113,114,115,116,119,120,121,122,123,124,9,127,24,130,132,129,125,131,118,117,133,134}'::bigint[])) 
        -> Bitmap Index Scan on testtransactionid (cost=0.00..5.63 rows=100 width=0) (actual time=0.020..0.020 rows=2 loops=276) 
          Index Cond: ((transactionid)::text = ("ANY_subquery".transactionid)::text) 
Total runtime: 9.122 ms 

我稍後再回來看看,讓你知道,如果這變成了一個可行的解決方案真正:)

+0

請耐心使用這些查詢,以便我們知道您已經嘗試了些什麼。 – 2012-04-10 23:17:21

+0

所以你的事件細節條件是varchar 255和一個文本字段?你應該輕拍自己的背部,將其降低到10秒。 Event_Keys table int和varchar來標準化它們,索引將是一個開始,但文本字段是一個問題,雖然... – 2012-04-10 23:23:35

+0

@Ben:發佈這些查詢將使我的評論兩長。不知道如何去解決這個問題。 – 2012-04-10 23:52:38

回答

0

如果必須使用沿着這些路線設計,你應該消除events_eventdetails id列,並宣佈主鍵是(event_id,keyname)。這會給你一個非常有用的索引,而不會爲合成密鑰維護一個無用的索引。

更好的辦法是完全消除events_eventdetails表,並使用gin索引來爲該數據使用hstore列。這可能會幫助您達到性能目標,而無需預先定義存儲的事件詳細信息。

更好的是,如果您可以預測或指定可能的事件細節,則不會嘗試在數據庫中實現數據庫。將每個「keyname」值放入events_eventdetails中的列中,並使用適合於該數據性質的數據類型。這可能允許以更快的訪問速度進行訪問,代價是需要發佈ALTER TABLE語句作爲細節變化的性質。

+0

爲了保持問題的範圍有限,您可能會明白我簡化了我們迄今爲止所實施內容的描述。由於events_eventdetails表實際上不僅僅是keyname/value,而是keyname/index/value,因此每個事件允許多個重複鍵,所以這可能不起作用。儘管hstore聽起來像是純鍵/值對的一個很好的解決方案。在PostgreSQL中學習還有很多! – 2012-04-11 00:05:53

+0

我並不清楚keyname/index/value是什麼意思。你能澄清嗎?如果不瞭解實際問題,很難提出解決方案。 – kgrittn 2012-04-11 02:47:37

+0

有一段時間有鍵/值對之後,我們發現需要有序的重複鍵,並使用events_eventdetails表的listindex列(整數)以及鍵名和值列(原始鍵/值對)。 – 2012-04-11 18:49:29

0

如果在events_eventdetails表中所有行的7-10%以上都滿足您的密鑰(在這種情況下爲reductionId),則PostgreSQL將更喜歡SeqScan。沒有什麼可以做的,它是最快的方法。

我有一個類似的情況下使用ISO8583數據包。每個數據包由128個字段(通過設計),所以第一數據庫設計遵循與2代表你的方法:

  • field_id和描述在一個表(events_events你的情況),
  • field_id + field_value在另一個( events_eventdetails)。

雖然這樣的佈局遵循3NF,我們打同樣的問題,立竿見影:

  • 糟糕的表現,
  • 高度複雜的查詢。

在你的情況下,你應該去重新設計。一種選擇(更容易)是使events_eventdetails.keynamesmallint,這將使比較操作更快。雖然不是很大的勝利。

另一種選擇是減少2個表成一個單一的一個,是這樣的:

CREATE TABLE events_events (
    id   bigserial, 
    datetime  timestamp with time zone, 
    eventtype_id bigint, 
    transactionId text, -- value for transactionId 
    reductionId text, -- -"-  reductionId 
    companyId  text, -- etc. 
    customerId text, 
    anyotherId text, 
    ... 
); 

打破3NF,但在另一方面:

  • 你有更多的自由索引你的數據;
  • 您的查詢將更短,更易於維護;
  • 表現會好得多。

可能的缺點:

  • 你會浪費了未使用的字段多一點空間:unused fields/8 bytes per row
  • 你可能仍然需要過於後面的事件額外的表,以保持一個單獨的列對於。

編輯:

我不太明白你的意思Materialise的東西在這裏。

在你的問題你提到你想:

「解決方案」,它返回events_events行100和200和300一起在一個單一的結果集和FAST!當詢問reductionId = 123時,或者當詢問customerId = 234或詢問companyId = 345時。

建議的重新設計會根據您的events_eventdetails創建一個交叉表或數據透視表。 而得到滿​​足你的條件,你可以使用所有events_events行:

SELECT * 
    FROM events_events 
WHERE id IN (100, 200, 300) 
    AND reductionId = 123 
-- AND customerId = 234 
-- AND companyId = 345; 
+0

謝謝你。我還沒有看到它如何解決這個問題的實質,你必須首先實現匹配真正的WHERE條件的事件,然後才能查詢具有與前一個物化結果集中相同的事務ID的事件。 – 2012-04-11 15:11:24

+0

到目前爲止,對於任何查詢設計,我似乎都有一部分查詢實質上返回了與基本WHERE匹配的所有行(例如companyId = 345)。該子集可能很大(約20k),因爲ORDER和LIMIT不能被應用(看起來)。選擇companyId = x爲一個返回一個小子集的公司,並且它大大提高了查詢速度。我把這個中間子集稱爲「物化」(也許這是一個不正確或令人困惑的描述)。 – 2012-04-12 17:20:44

+0

即使非規範化transactionId/companyId也會導致該中間大型子集。如果您還建議更新companyId以及將被間接查詢的行中的非規格化列(通過transactionId),以便我不再查詢transactionId間接尋址,那麼這將不起作用:這種間接查詢的行可能已經有一個(不同的)companyId - 和 - 鍵不是唯一的(我已更新了我的原始帖子以顯示,請參閱reductionId)。 – 2012-04-12 17:22:20

1

理念是到denormalise,但正常化。 events_details()表可以被兩個表代替:一個表具有event_detail_types,一個具有實際值(指{even_id,detail_types}) 這將使查詢的執行更容易,因爲只有數字ID必須提取和選擇detail_types,增益取決於數據庫管理系統必須讀取的頁面數量的減少,因爲所有的鍵名稱只需要存儲+檢索+比較一次

注意:我更改了。冠名一點對於健全和安全的原因,主要是

SET search_path='cav'; 
/**** ***/ 
DROP SCHEMA cav CASCADE; 
CREATE SCHEMA cav; 
SET search_path='cav'; 

CREATE TABLE event_types 
(
    -- this table holds some 50 rows 
    id bigserial PRIMARY KEY 
    , zname varchar(255) 
); 
INSERT INTO event_types(zname) 
SELECT 'event_'::text || gs::text 
FROM generate_series (1,100) gs 
     ; 

CREATE TABLE events 
(
    -- this table holds some 15M rows 
    id bigserial PRIMARY KEY 
    , zdatetime timestamp with time zone 
    , eventtype_id bigint REFERENCES event_types(id) 
); 
INSERT INTO events(zdatetime,eventtype_id) 
SELECT gs, et.id 
FROM generate_series ('2012-04-11 00:00:00'::timestamp 
        , '2012-04-12 12:00:00'::timestamp ,' 1 hour'::interval) gs 
     , event_types et 
     ; 

-- SELECT * FROM event_types; 
-- SELECT * FROM events; 

CREATE TABLE event_details 
(
    -- this table holds some 65M rows 
    id bigserial PRIMARY KEY 
    , event_id bigint REFERENCES events(id) 
    , keyname varchar(255) 
    , zvalue text 
); 

INSERT INTO event_details(event_id, keyname) 
SELECT ev.id,im.* 
FROM events ev 
     , (VALUES ('transactionId'::text),('someKey'::text) 
      ,('reductionId'::text),('customerId'::text),('companyId'::text) 
     ) im 
     ; 
UPDATE event_details 
SET zvalue = 'Some_value'::text || (random() * 1000)::int::text 
     ; 
     -- 
     -- Domain table with all valid detail_types 
     -- 
CREATE TABLE detail_types(
     id bigserial PRIMARY KEY 
     , keyname varchar(255) 
     ); 
INSERT INTO detail_types(keyname) 
SELECT DISTINCT keyname 
     FROM event_details 
     ; 

     -- 
     -- Context-attribute-value table, referencing {event_id, type_id} 
     -- 
CREATE TABLE event_detail_values 
     (event_id BIGINT 
     , detail_type_id BIGINT 
     , zvalue text 
     , PRIMARY KEY(event_id , detail_type_id) 
     , FOREIGN KEY(event_id) REFERENCES events(id) 
     , FOREIGN KEY(detail_type_id)REFERENCES detail_types(id) 
     ); 

     -- 
     -- For the sake of joining we create some natural keys 
     -- 
CREATE INDEX events_details_keyname ON event_details (keyname) ; 
CREATE INDEX detail_types_keyname ON detail_types(keyname) ; 

INSERT INTO event_detail_values (event_id,detail_type_id, zvalue) 
     SELECT ed.event_id, dt.id 
       , ed.zvalue 
     FROM event_details ed 
     , detail_types dt 
     WHERE ed.keyname = dt.keyname 
     ; 
     -- 
     -- Now we can drop the original table, and use the view instead 
     -- 
DROP TABLE event_details; 
CREATE VIEW event_details AS (
     SELECT dv.event_id AS event_id 
       , dt.keyname AS keyname 
       , dv.zvalue AS zvalue 
     FROM event_detail_values dv 
     JOIN detail_types dt ON dt.id = dv.detail_type_id 
     ); 
EXPLAIN ANALYZE 
SELECT ev.id AS event_id 
     , ev.zdatetime AS zdatetime 
     , ed.keyname AS keyname 
     , ed.zvalue AS zevalue 
     FROM events ev 
     JOIN event_details ed ON ed.event_id = ev.id 
     WHERE ed.keyname IN ('transactionId','customerId','companyId') 
     ORDER BY event_id,keyname 
     ; 

生成的查詢計劃:

                QUERY PLAN                 
---------------------------------------------------------------------------------------------------------------------------------------------- 
Sort (cost=1178.79..1197.29 rows=7400 width=40) (actual time=159.902..177.379 rows=11100 loops=1) 
    Sort Key: ev.id, dt.keyname 
    Sort Method: external sort Disk: 560kB 
    -> Hash Join (cost=108.34..703.22 rows=7400 width=40) (actual time=12.225..122.231 rows=11100 loops=1) 
     Hash Cond: (dv.event_id = ev.id) 
     -> Hash Join (cost=1.09..466.47 rows=7400 width=32) (actual time=0.047..74.183 rows=11100 loops=1) 
       Hash Cond: (dv.detail_type_id = dt.id) 
       -> Seq Scan on event_detail_values dv (cost=0.00..322.00 rows=18500 width=29) (actual time=0.006..26.543 rows=18500 loops=1) 
       -> Hash (cost=1.07..1.07 rows=2 width=19) (actual time=0.025..0.025 rows=3 loops=1) 
        Buckets: 1024 Batches: 1 Memory Usage: 1kB 
        -> Seq Scan on detail_types dt (cost=0.00..1.07 rows=2 width=19) (actual time=0.009..0.014 rows=3 loops=1) 
          Filter: ((keyname)::text = ANY ('{transactionId,customerId,companyId}'::text[])) 
     -> Hash (cost=61.00..61.00 rows=3700 width=16) (actual time=12.161..12.161 rows=3700 loops=1) 
       Buckets: 1024 Batches: 1 Memory Usage: 131kB 
       -> Seq Scan on events ev (cost=0.00..61.00 rows=3700 width=16) (actual time=0.004..5.926 rows=3700 loops=1) 
Total runtime: 192.724 ms 
(16 rows) 

正如您所看到的,查詢的「最深」部分是在給定字符串列表的情況下檢索detail_type_ids。將其放入哈希表中,然後將其與detail_values的相應哈希集組合。 (NB:這是pg-9.1)

YMMV。

相關問題