2010-08-05 66 views
2

我有2個數據庫具有相同的結構,但數據不同。這兩個都是SQL 2005.在2個獨立的數據庫中找到唯一匹配

我想找到哪個數據庫A中的人存在於數據庫B中。我最好的匹配機會是匹配FirstName和LastName。

我只是想帶回的列表:

DatabaseA.Person DatabaseB.Person

其中: 1.我想從DatabaseA所有記錄,即使沒有在數據庫匹配B. 2.我只想要來自DatabaseB的記錄,其中FirstName/LastName只匹配DatabaseB中的一條記錄。

我已經寫了一個查詢,我在那裏分組,但由於我需要查看比FirstName和LastName更多的數據,所以無法將其分組 - 它給了我許多重複項。我應該使用什麼樣的查詢?我需要使用遊標嗎?

這裏是我的查詢現在,哪種作品 - 除了我在DatabaseB中獲取重複結果以及我想知道關於數據庫B的所有結果是當FirstName/LastName與一個不同的記錄匹配並且沒有其他記錄時。我的目標是獲得一個我知道的人是2個數據庫中同一個人的列表,以便我可以構建員工之間部門代碼映射的字典列表。

select 
count(DatabaseAEmployee.id) as matchcount 
, DatabaseAPerson.id as DatabaseAPersonid 
, DatabaseAEmployee.DeptCode DatabaseADeptCode 
, DatabaseAPerson.firstname as DatabaseAfirst 
, DatabaseAPerson.lastname as DatabaseAlast 
, DatabaseBPerson.id as DatabaseBPersonid 
, DatabaseBEmployee.DeptCode as DatabaseBDeptCode 
, DatabaseBPerson.firstname as DatabaseBfirst 
, DatabaseBPerson.lastname as DatabaseBlast 
, DatabaseAPerson.ssn as DatabaseAssn 
, DatabaseBPerson.ssn as DatabaseBssn 
, DatabaseAPerson.dateofbirth as DatabaseAdob 
, DatabaseBPerson.dateofbirth as DatabaseBdob 

FROM [DatabaseA].[dbo].Employee DatabaseAEmployee 
LEFT OUTER JOIN [DatabaseA].[dbo].Person DatabaseAPerson 
ON DatabaseAPerson.id = DatabaseAEmployee.id 
LEFT OUTER JOIN [DatabaseB].[dbo].Person DatabaseBPerson 
ON 
DatabaseAPerson.firstname = DatabaseBPerson.firstname 
AND 
DatabaseAPerson.lastname = DatabaseBPerson.lastname 
LEFT OUTER JOIN [DatabaseB].[dbo].Employee DatabaseBEmployee 
on DatabaseBEmployee.id = DatabaseBPerson.id 
group by 
DatabaseAPerson.firstname 
, DatabaseAPerson.lastname 
, DatabaseAPerson.id 
, DatabaseAEmployee.DeptCode 
, DatabaseBPerson.id 
, DatabaseBEmployee.DeptCode 
, DatabaseBPerson.firstname 
, DatabaseBPerson.lastname 
, DatabaseBPerson.ssn 
, DatabaseAPerson.ssn 
, DatabaseBPerson.dateofbirth 
, DatabaseAPerson.dateofbirth 

這裏就是我想現在,但我在左邊得到重複:

with UniqueMatchedPersons (Id, FirstName, LastName) 
as (
select 
    p2.ID, p2.FirstName, p2.LastName 
from 
    [DatabaseA].[dbo].[Employee] p1 
INNER JOIN [DatabaseA].[dbo].[Person] p2 on p1.id = p2.id 
    inner join [DatabaseB].[dbo].[Person] p3 
     on p2.FirstName = p3.FirstName and p2.LastName = p3.LastName 
INNER JOIN [DatabaseB].[dbo].[Employee] p4 
on p3.id = p4.id 

group by p2.ID, p2.FirstName, p2.LastName 
having count(p2.ID) = 1 

) 

select p1.*, p2.* 
from DatabaseA.dbo.Person p1 
inner join UniqueMatchedPersons on p1.ID = UniqueMatchedPersons.ID 
left outer join DatabaseB.dbo.Person p2 
    on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName 
+2

爲了什麼數據庫?請包括版本。這聽起來像你想'INTERSECT' ... ... – 2010-08-05 18:53:50

+0

還包括表結構和數據的例子,所以我們可以幫助你更容易 – 2010-08-05 18:55:48

回答

2

試試這個:

SELECT id,FirstName,Lastname 
FROM dba.Persons 
UNION 
SELECT b.id,b.FirstName,b.LastName 
FROM dbb.Persons as b 
INNER JOIN dba.Persons as a 
ON b.FirstName = a.FirstName AND b.LastName = a.LastName 

如果你想從所有A和只有來自B的那些沒有匹配(這對我來說更有意義)我會用這個:

SELECT id,FirstName,Lastname 
FROM dba.Persons 
UNION 
SELECT b.id,b.FirstName,b.LastName 
FROM dbb.Persons as b 
LEFT OUTER JOIN dba.Persons as a 
ON b.FirstName = a.FirstName AND b.LastName = a.LastName 
WHERE a.id is null 
2

試着這麼做:

Select dta.LastName, dta.FirstName, dta.[otherColumns] dtb.LastName, dtb.FirstName 
    dtb.[otherColumns] 
From [databaseA].[table] as dta 
LEFT OUTER JOIN [databaseB].[table] as dtb 
    on dta.Lastname = dtb.LastName and dta.FirstName = dtb.FirstName 

這應該讓你:1)每個人都在表A,和2)每個人都在表B中誰是在表A

2

廠一姓/名字匹配當SQL Server(至少應該)

SELECT 
    A.* 
    , B.* 
FROM 
    DatabaseA.dbo.Person A 
    LEFT JOIN DatabaseB.dbo.Person B 
     ON A.FirstName = B.FirstName AND A.LastName = B.LastName 

編輯:你提到你收到來自DatabaseB重複,你只需要在第一次和姓氏比賽。但你也要求其他數據(然後是第一個/姓氏),這是問題所在。如果你區分數據,他們只需要這些數據。

2

使用Transact-SQL,下面未經測試的查詢應該允許您查看獨特的比賽只有:

select 
    p1.ID, p1.FirstName, p1.LastName 
from 
    [DatabaseA].[dbo].[Persons] p1 
    left outer join [DatabaseB].[dbo].[Persons] p2 
     on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName 

group by p1.ID, p1.FirstName, p2.LastName 

having count(p1.ID) = 1 

如果使用SQL Server,這隨後可在公共表表達式內封裝,到可以執行加入。

with UniqueMatchedPersons (Id, FirstName, LastName) 
as (
    --query in previous code snippet 
) 
select persons.* 
from Persons 
inner join UniqueMatchedPersons on Persons.ID = UniqueMatchedPersons.ID 

更新:

如果你想從兩個表選擇字段,你可以簡單地respecify該評估名稱前配套原連接條件;這是因爲連接左側的重複匹配已被having聚合條件過濾掉。

修改上面的片段的select部讀取下面將允許您從加入的任一側選擇字段:

select p1.*, p2.* 
from [DatabaseA].[dbo].[Persons] p1 
inner join UniqueMatchedPersons on p1.ID = UniqueMatchedPersons.ID 
left outer join [DatabaseB].[dbo].[Persons] p2 
    on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName 

更新2:

要在過濾掉重複左側(這也會導致右側重複),您將不得不刪除[DatabaseA].[dbo].[Persons].[ID]上的分組。

當我指的是重複項時,我的意思是在字符和填充方面相鄰的行中的名稱。如果您的名字和姓氏有變音符號變體,則名稱比較的結果將受數據庫歸類的限制(除非您明確聲明瞭聯接表達式上的歸類)。同樣,如果您在名稱之間有間距,填充或標點符號的變化,您可能需要考慮與直接相等運算符進行名稱匹配不同的方法。

嘗試以下操作:

with UniqueMatchedPersons (FirstName, LastName) 
as (
select 
    p1.FirstName, p1.LastName 
from 
    [DatabaseA].[dbo].[Person] p1 
    left outer join [DatabaseB].[dbo].[Person] p2 
     on p2.FirstName = p3.FirstName and p2.LastName = p3.LastName 

group by p1.FirstName, p1.LastName 
having count(p1.FirstName) = 1 
) 

select p1.*, p2.*, e1.*, e2.* 
from [DatabaseA].[dbo].[Person] p1 
inner join UniqueMatchedPersons ump 
     on p1.FirstName = ump.FirstName and p1.LastName = ump.LastName 
left outer join [DatabaseB].[dbo].[Person] p2 
     on p1.FirstName = p2.FirstName and p1.LastName = p2.LastName 
inner join [DatabaseA].[dbo].[Employee] e1 on p1.ID = e1.ID 
inner join [DatabaseB].[dbo].[Employee] e2 on e2.ID = p2.ID 

order by p1.id asc 
+0

非常感謝你的所有細節。我正在嘗試您的第二個查詢(我已更新我的問題以包含它)。但我仍然在左側和右側得到重複。 – user53885 2010-08-06 02:31:37

+0

聽起來就像表['DatabaseA]。[dbo]。[Persons]'中的重複部分,多人共享相同的名字和姓氏,但具有不同的ID。我會更新我的答案。 – Rabid 2010-08-06 09:26:30

相關問題