我使用的是在http://www.sascommunity.org/mwiki/images/2/22/Hashmerge.sas發現%HASHMERGE
宏和下面的示例數據集的:SAS哈希合併 - 小數據集作爲哈希對象
data working;
length IID TYPE $12;
input IID $ TYPE $;
datalines;
B 0
B 0
A 1
A 1
A 1
C 2
D 3
;
run;
data master;
length IID FIRST_NAME MIDDLE_NAME LAST_NAME SUFFIX_NAME $12;
input IID $ FIRST_NAME $ MIDDLE_NAME $ LAST_NAME $ SUFFIX_NAME;
datalines;
X John James Smith Sr
Z Sarah Marie Jones .
Y Tim William Miller Jr
C Nancy Lynn Brown .
B Carol Elizabeth Collins .
A Wayne Mark Rooney .
;
run;
在working
數據集,我試圖附加_NAME
變量從master
數據集使用此散列合併。輸出看起來很好,是所需的輸出。但是,在我的真實場景中,的master
數據集太大,無法放入散列對象,並且該宏一直將其作爲散列對象。我最終想要將這兩個數據集翻到working
數據集爲哈希對象的位置,但是當我翻轉代碼時,我無法獲得所需的輸出。下面是產生所需的輸出和需求調整了宏觀調控的一部分,但我不能確定如何設置此:
data OUTPUT;
if 0 then set MASTER (keep=IID FIRST_NAME MIDDLE_NAME LAST_NAME SUFFIX_NAME)
WORKING (keep=IID);
declare hash h_merge(dataset:"MASTER"); /* I want WORKING to be the hash object since it's smaller! */
rc=h_merge.DefineKey("IID");
rc=h_merge.DefineData("FIRST_NAME","MIDDLE_NAME","LAST_NAME","SUFFIX_NAME");
rc=h_merge.DefineDone();
do while(not eof);
set WORKING (keep=IID) end=eof;
call missing(FIRST_NAME,MIDDLE_NAME,LAST_NAME,SUFFIX_NAME);
rc=h_merge.find();
output;
end;
drop rc;
stop;
run;
所需的輸出:
IID FIRST_NAME MIDDLE_NAME LAST_NAME SUFFIX_NAME
---------------------------------------------------
B Carol Elizabeth Collins
B Carol Elizabeth Collins
A Wayne Mark Rooney
A Wayne Mark Rooney
A Wayne Mark Rooney
C Nancy Lynn Brown
D
您可以使用工作數據集的IID過濾主數據集,然後將過濾後的主數據集與工作數據集合並,因此這兩個數據集都很小,很容易處理。 –
我嘗試了一個'SQL left join'過濾器,它花費的時間比讀取主數據集,排序和合並使用'if a'工作(in = a)master'花費的時間要長。 – Foxer
試試這個過濾器:proc sql;創建表New_master作爲select * from master(keep = keep = IID FIRST_NAME MIDDLE_NAME LAST_NAME SUFFIX_NAME)where IID in(select IID from working(keep = IID));放棄; –