2014-11-04 119 views
3

我有一個單元格字符串矩陣,其中40,000行和一個與400。我需要找到適合secondfirst矩陣中的那些行(行)。請注意,可能會有很多重複。比較兩個單元格陣列的相同行 - MATLAB

它看起來: 40,000線,如

Anna Frank 
Anna George 
Jane Peter 
Anna George 
Jane Peter  
etc. 

這裏我需要找到適合

Anna George 
Jane Peter 

我發現,到目前爲止兩個for功能和之間的if的唯一途徑。但它是相當緩慢:

for i=2:size(bigTable,1) 
    for j = 1: size(smallTable,1) 
     if sum(ismember(bigTable(i,1:2),smallTable(j,1:2))) == 2 
      Total_R(size(Total_R,1)+1,1)= i; 
     end 
    end 
end 
+0

你有沒有考慮串聯姓氏和名字?這樣一個單一的會員應該做的伎倆。 – 2014-11-04 12:14:08

回答

3

我假設你的輸入設置這樣的 -

bigTable = 
    'Anna' 'Frank' 
    'Anna' 'George' 
    'Jane' 'Peter' 
    'Anna' 'George' 
    'Jane' 'Peter' 
smallTable = 
    'Anna' 'George' 
    'Jane' 'Peter' 

爲了解決你的情況下,有兩種方法可以在這裏建議。

方法#1

ismember基礎的方法 -

Total_R = find(sum(ismember(bigTable,smallTable,'rows'),2)==2) 

方法2

%// Assign unique labels to each cell for both small and big cell arrays, so that 
%// later on you would be dealing with numeric arrays only and 
%// do not have to mess with cell arrays that were slowing you down 
[unqbig,matches1,idx] = unique([bigTable(:) ; smallTable(:)]) 
big_labels = reshape(idx(1:numel(bigTable)),size(bigTable)) 
small_labels = reshape(idx(numel(bigTable)+1:end),size(smallTable)) 

%// Detect which rows from small_labels exactly match with those from big_labels 
Total_R = find(ismember(big_labels,small_labels,'rows')) 

或替換ismember從一個bsxfun基於執行的最後一行 -

Total_R = find(any(all(bsxfun(@eq,big_labels,permute(small_labels,[3 2 1])),2),3)) 

從這些方法爲假定輸入輸出的情況下 -

Total_R = 
    2 
    3 
    4 
    5 
+0

注意:對於2010或最近版本的MATLAB,您可以跳過某些輸出,所以您可以改爲 - '[〜,〜,idx] = unique([bigTable(:); smallTable(:)])''。 – Divakar 2014-11-04 13:11:52