2017-06-13 158 views
0

我試圖在8個表格之間進行連接,並且因爲每個表格都有超過500,000個條目,所以它非常緩慢。我想知道,你有什麼最好的方法來加入這些表?加入具有相同結構但數據不同的多個表格

所有表具有這樣的結構:

data_temprature:

+----+----------+-----+-----------+----------+ 
| ID_geo | NAME  | Value | Date   | 
+--------+----------+-------+-----------------+ 
| 10005 | Madrid | 32 | 2017-06-12 08:00| 
| 10005 | Madrid | 25 | 2017-06-12 09:00| 
| 12701 | Paris | 23 | 2017-06-12 08:00| 
| 13006 | Tokyo | 25 | 2017-06-12 11:00| 
| 11132 | Sevilla | 27 | 2017-06-12 16:00| 
| 21333 | London | 22 | 2017-06-12 17:00| 
+--------+----------+-------+-----------------+ 

data_WeatherSimbol

+----+----------+-----+-----------+----------+ 
| ID_geo | NAME  | Value | Date   | 
+--------+----------+-------+-----------------+ 
| 10005 | Madrid | A+ | 2017-06-12 08:00| 
| 10005 | Madrid | A | 2017-06-12 09:00| 
| 12701 | Paris | A- | 2017-06-12 08:00| 
| 13006 | Tokyo | C- | 2017-06-12 11:00| 
| 11132 | Sevilla | I+ | 2017-06-12 16:00| 
| 21333 | London | D- | 2017-06-12 17:00| 
+--------+----------+-------+-----------------+ 

我想打一個加盟得到這樣的結果:

+----+----------+-----+-----------+----------+-----------------+ 
| ID_geo | NAME  | Temperature | Simboles |  Date  | 
+--------+----------+-------------+----------+-----------------+ 
| 10005 | Madrid |  32  | A+ | 2017-06-12 08:00| 
| 10005 | Madrid |  25  | A  | 2017-06-12 09:00| 
| 12701 | Paris |  23  | A- | 2017-06-12 08:00| 
| 13006 | Tokyo |  25  | C- | 2017-06-12 11:00| 
| 11132 | Sevilla |  27  | I+ | 2017-06-12 16:00| 
| 21333 | London |  22  | D- | 2017-06-12 17:00| 
+--------+----------+-------------+----------+-----------------+ 

ŧ漢克斯

UPDATE REAL數據提供:

執行計劃: https://files.fm/u/b4besk27

這是查詢:

SELECT 
    cielo.data_value AS cielo, 
    lluv.data_value AS lluvia, 
    temp.data_value AS temp, 
    vientos.data_value AS viento, 
    tmin.data_value AS tempmin, 
    tmax.data_value AS tempmax, 
    cielo.data_date AS DiaPrev 
FROM 
    data_cielo AS cielo 
INNER JOIN data_lluvia AS lluv ON cielo.data_geo = lluv.data_geo 
INNER JOIN data_presion AS pres ON cielo.data_geo = pres.data_geo 
INNER JOIN data_temp AS temp ON cielo.data_geo = temp.data_geo 
LEFT JOIN data_tempmax AS tmax ON cielo.data_geo = tmax.data_geo 
LEFT JOIN data_tempmin AS tmin ON cielo.data_geo = tmin.data_geo 
INNER JOIN data_viento AS vientos ON cielo.data_geo = vientos.data_geo 

WHERE 
    cielo.data_date = lluv.data_date 
AND pres.data_date = cielo.data_date 
AND vientos.data_date = pres.data_date 
AND temp.data_date = vientos.data_date 
AND cielo.data_geo = 46 ORDER BY cielo.data_date; 
and this is the result: 

E+ 0.0461028 29.6937088 S2 19.408 36.39 2017-06-13 12:00:00.000 
E+ 0.0461028 29.6937088 S2 21.422 36.39 2017-06-13 12:00:00.000 
E+ 0.0461028 29.6937088 S2 19.408 37.853 2017-06-13 12:00:00.000 
E+ 0.0461028 29.6937088 S2 21.422 37.853 2017-06-13 12:00:00.000 
E+ 0.0461028 30.7593854 S2 19.408 36.39 2017-06-13 13:00:00.000 
E+ 0.0461028 30.7593854 S2 21.422 36.39 2017-06-13 13:00:00.000 
E+ 0.0461028 30.7593854 S2 19.408 37.853 2017-06-13 13:00:00.000 
E+ 0.0461028 30.7593854 S2 21.422 37.853 2017-06-13 13:00:00.000 
A+ 0.0461028 31.6310774 SSW2 19.408 36.39 2017-06-13 14:00:00.000 
A+ 0.0461028 31.6310774 SSW2 21.422 36.39 2017-06-13 14:00:00.000 
A+ 0.0461028 31.6310774 SSW2 19.408 37.853 2017-06-13 14:00:00.000 
A+ 0.0461028 31.6310774 SSW2 21.422 37.853 2017-06-13 14:00:00.000 
A 0.0461028 32.2647927 S2 19.408 36.39 2017-06-13 15:00:00.000 
A 0.0461028 32.2647927 S2 21.422 36.39 2017-06-13 15:00:00.000 
A 0.0461028 32.2647927 S2 19.408 37.853 2017-06-13 15:00:00.000 

它should't做出這樣這,我需要的resualt就像我所說的溫度圖,壓力,Percipitation,天空中的每個小時的數據值,......

+0

國際海事組織,糟糕的設計沒有任何規範化。 –

+0

@PrabhatG這是因爲它從txt文件插入8個表(8個計量變量),我不知道他們爲什麼要這樣設計它,但這就是它的任何建議? –

+0

嘗試在ID_Geo上創建索引。這會減少查詢執行時間。 – Debabrata

回答

0

我想你可以剛剛加入的地理和日期:

select t.*, ws.simboles 
from data_temperature t join 
    data_WeatherSimbol ws 
    on t.ID_geo = ws.ID_geo and t.date = ws.date; 
+0

這就是它的超級慢的問題 –

+0

爲什麼'join'會「超級慢」? –

+0

我猜是因爲對很多連接而言,最好從這些表中查看視圖?或者用索引集羣管理它? –

0

試試這個

;With data_temprature(ID_geo,NAME,Value,[Date]) 
AS 
(
SELECT 10005 , 'Madrid' , 32 , '2017-06-12 08:00' Union all 
SELECT 10005 , 'Madrid' , 25 , '2017-06-12 09:00' Union all 
SELECT 12701 , 'Paris' , 23 , '2017-06-12 08:00' Union all 
SELECT 13006 , 'Tokyo' , 25 , '2017-06-12 11:00' Union all 
SELECT 11132 , 'Sevilla' , 27 , '2017-06-12 16:00' Union all 
SELECT 21333 , 'London' , 22 , '2017-06-12 17:00' 
) 
,data_WeatherSimbol(ID_geo,NAME,Value,[Date]) 
AS 
(
SELECT 10005 , 'Madrid' , 'A+' , '2017-06-12 08:00' Union all 
SELECT 10005 , 'Madrid' , 'A' , '2017-06-12 09:00' Union all 
SELECT 12701 , 'Paris' , 'A-' , '2017-06-12 08:00' Union all 
SELECT 13006 , 'Tokyo' , 'C-' , '2017-06-12 11:00' Union all 
SELECT 11132 , 'Sevilla' , 'I+' , '2017-06-12 16:00' Union all 
SELECT 21333 , 'London' , 'D-' , '2017-06-12 17:00' 
) 
SELECT ID_geo, 
     NAME, 
     Temperature, 
     Symboles, 
     [Date] From 
(
SELECT t.ID_geo , 
     t.NAME , 
     t.Value AS Temperature, 
     w.Value AS Symboles,t.[Date] , 
     ROW_NUMBER()OVER(PARTITION BY t.Value,t.[Date] ORDER BY t.[Date]) AS Rno 
FROM data_temprature t 
INNER join data_WeatherSimbol w 
On t.ID_geo=w.ID_geo 
)Dt 
WHERE Dt.Rno=1 
ORDER BY ID_geo 
0

無論[ID_geo]也不[Date]似乎是不夠的獨特的加盟,讓:

  1. 創建的兩列的索引的所有表像

    create index IX_data_temprature on data_temprature ([ID_geo], [Date])

  2. 通過[ID_geo]加入所有的表,[Date]

0

大部分查詢的負載是由RID引起查找。

當索引沒有包含查詢時,SID查找被使用(Sql必須查找表中的值,因爲它們不包含在索引中)並且索引是非集羣的。

如果您使用覆蓋索引,則查詢速度可能會更快,您可能未在索引中包含值。更多關於包括可以在Microsoft docs中找到。

如果您將非聚簇索引更改爲聚簇索引,它也可能有所幫助。

相關問題