2015-07-04 100 views
1

我想要獲取數據庫AA中數據庫AA中缺失的任何表或字段。我正在使用INFORMATION_SCHEMA.columns獲取信息。所以,我寫了一個'缺失記錄'查詢來找到它們。在測試中,我使用了2個數據庫,我知道BB在另一個表中有1個缺失的表和1個缺失的字段。
這是我第一次嘗試:確定兩個MySQL數據庫模式之間的差異

SELECT AA.table_name, 
     AA.column_name, 
     BB.table_name, 
     BB.column_name 
FROM information_schema.columns AS AA 
     LEFT JOIN information_schema.columns AS BB 
       ON (AA.table_name = bb.table_name) 
       AND (AA.column_name = BB.column_name) 
WHERE AA.table_schema = 'wireless-2015-05' 
    AND BB.table_schema = 'wireless-2015-04' 
    AND BB.column_name IS NULL 

這返回0的記錄。所以,然後我嘗試:

SELECT AA.table_name, 
     AA.column_name 
FROM information_schema.columns AS AA 
WHERE AA.table_schema = 'wireless-2015-04' 
    AND NOT EXISTS(SELECT BB.table_name, 
         BB.column_name 
        FROM information_schema.columns AS BB 
        WHERE BB.table_schema = 'wireless-2015-05') 

我再次得到0條記錄。最後我試過這個:

SELECT table_name, 
     column_name 
FROM (SELECT DISTINCT table_name, 
         column_name 
     FROM information_schema.columns 
     WHERE table_schema = 'wireless-2015-04' 
     UNION ALL 
     SELECT DISTINCT table_name, 
         column_name 
     FROM information_schema.columns 
     WHERE table_schema = 'wireless-2015-05') AS tbl 
GROUP BY table_name, 
      column_name 
HAVING Count(*) = 1 

這產生了預期的結果。

雖然我不介意使用第三個查詢,但我無法弄清楚爲什麼前兩個不起作用。我想知道以供將來參考。任何人都可以發現問題嗎?


更新:
對於那些感興趣的,這裏有4個查詢的工作,以及運行每一個的時間。按照最快的順序列出,並且在查詢下方列出時間。

SELECT AA.table_name, 
     AA.column_name 
FROM information_schema.columns AS AA 
     LEFT JOIN (SELECT table_name, 
         column_name 
        FROM information_schema.columns 
        WHERE table_schema = 'wireless-2015-04') BB 
       ON AA.table_name = BB.table_name 
       AND AA.column_name = BB.column_name 
WHERE AA.table_schema = 'wireless-2015-05' 
     AND BB.table_name IS NULL; 

0.047秒

SELECT table_name, 
     column_name 
FROM (SELECT DISTINCT table_name, 
         column_name 
     FROM information_schema.columns 
     WHERE table_schema = 'wireless-2015-04' 
     UNION ALL 
     SELECT DISTINCT table_name, 
         column_name 
     FROM information_schema.columns 
     WHERE table_schema = 'wireless-2015-05') AS tbl 
GROUP BY table_name, 
      column_name 
HAVING Count(*) = 1; 

0.078秒

SELECT DISTINCT table_name, 
       column_name, 
       Concat(table_name, '--', column_name) AS tc 
FROM information_schema.columns 
WHERE table_schema = 'wireless-2015-05' 
HAVING tc NOT IN(SELECT DISTINCT Concat(table_name, '--', column_name) 
       FROM information_schema.columns 
       WHERE table_schema = 'wireless-2015-04'); 

0.125秒(一個新的解決方案,我認爲今天上午的)

SELECT aa.table_name, 
     aa.column_name 
FROM information_schema.columns aa 
WHERE table_schema = 'wireless-2015-05' 
     AND NOT EXISTS (SELECT 1 
         FROM information_schema.columns 
         WHERE table_schema = 'wireless-2015-04' 
           AND table_name = aa.table_name 
           AND column_name = aa.column_name); 

44.382秒。顯然不是一個好的現實世界的解決方案。

+0

information_schema對於查詢來說相對昂貴,因爲這些表並不是真實的,並且查詢經常檢查比查詢實際需要的更多的內部結構。這有助於解釋爲什麼第一個查詢更快 - 「LEFT JOIN(SELECT ...)BB'實際上創建了一個臨時表」BB「* first *,因此查詢中第二個表格實際上是在外部查詢運行之前完全填充,與最後顯示的非常緩慢的變體形成對比,這可能會針對每列向i_s發出請求。 –

回答

1

假設記錄看起來像這樣:

schema    table column 
    ---------------- ----- ------ 
1. wireless-2015-05 T1  F1 
2. wireless-2015-05 T1  F2 
3. wireless-2015-05 T2  F1 
4. wireless-2015-04 T1  F1 

注意,無線-2015-04缺少表T2和列T1.F2。我們將在描述和SQL Fiddle示例中使用此示例。你在前兩次嘗試中相當接近。只需稍作修改(下面包含)就可以確定它。

查詢1

讓我們打破第一個查詢。我們將離開where子句,因爲上面的例子只有where子句中提到的那兩個模式。

SELECT ... 
FROM information_schema.columns AS AA 
LEFT JOIN information_schema.columns AS BB 
    on aa.table_name = bb.table_name 
    and aa.column_name = bb.column_name 

wireless-2015-05 + T1 + F1第一個記錄是匹配的(基於表和列名),在同一個表中的所有記錄。所以,

  • AA的記錄#1將匹配BB的記錄#1和#4
  • AA的記錄#2將匹配BB的記錄#2
  • AA的記錄#3將匹配BB的記錄#3
  • AA的記錄#4將匹配BB的記錄#1和#4

例子:http://sqlfiddle.com/#!9/6b704/4

會有與沒有記錄BB.column_name。所以沒有記錄被提取。但是,這不是你正在尋找的。

查詢1改進

,你可以重新編寫查詢1使用這樣的事情,給你正確的結果:

SELECT AA.table_name, 
     AA.column_name 
FROM information_schema.columns AS AA 
LEFT JOIN 
( 
    select table_name, column_name from 
    information_schema.columns 
    where table_schema = 'wireless-2015-04' 
) BB 
    on AA.table_name = BB.table_name 
    and AA.column_name = BB.column_name 
WHERE 
    AA.table_schema = 'wireless-2015-05' 
    and BB.table_name is null 

例子:http://sqlfiddle.com/#!9/6b704/10

查詢2

基本上,查詢2的NOT EXISTS子查詢缺少與AA列匹配的子句。這樣就不會產生你的結果

查詢2改進

該查詢可以做這樣的事情正確改進:

select aa.table_name, aa.column_name 
from information_schema.columns aa 
where table_schema = 'wireless-2015-05' 
and not exists (
    select 1 
    from information_schema.columns 
    where table_schema = 'wireless-2015-04' 
    and table_name = aa.table_name 
    and column_name = aa.column_name 
); 

例子:http://sqlfiddle.com/#!9/6b704/9

希望這有助於。

+0

謝謝。如果你看看我的編輯,你會看到基準測試結果。你重寫查詢1給了最好的時間。 –

+0

非常好的作品,@TomCollins。感謝您分享基準測試結果。 – zedfoxus

0

你的第一個查詢應該是這樣的,

Select AA.* 
(
    SELECT table_name, 
      column_name 
    From information_schema.columns 
    Where table_schema = 'wireless-2015-05' 
) AA 
LEFT JOIN 
(
    SELECT table_name, 
      column_name 
    From information_schema.columns 
    Where table_schema = 'wireless-2015-04' 
)BB 
on AA.table_name = BB.table_name 
AND AA.column_name = BB.column_name 

WHERE BB.table_name is null or BB.column_name is null 

您的查詢問題:

你放在哪裏查詢與錯誤條件

WHERE AA.table_schema = 'wireless-2015-05' 
    AND BB.table_schema = 'wireless-2015-04' 
    AND BB.column_name IS NULL 

當記錄中不存在的BB然後BB.table_schema = 'wireless-2015-04'這種情況變得錯誤,因此整個結果將是錯誤的,所以你沒有重新獲得SULT。

而對於第二個查詢,我認爲@zedfoxus是正確的。

你也可以使用EXCEPT的概念,它給你的願望結果。

以下查詢從查詢​​中返回EXCEPT運算符左邊的所有不同值,這些值在正確的查詢中也找不到。

SELECT DISTINCT table_name, 
       column_name 
FROM information_schema.columns 
WHERE table_schema = 'wireless-2015-05' 

EXCEPT 

SELECT DISTINCT table_name, 
       column_name 
FROM information_schema.columns 
WHERE table_schema = 'wireless-2015-04' 
+0

EXCEPT子句不起作用。谷歌搜索顯示,該條款不適用於MySQL。 –