這個問題是不是只適用於MATLAB用戶 - 如果您知道PSEUDOCODE中的問題的答案，那麼隨時留下您的答案！合併兩個表格的內容（尋找Matlab或僞代碼）

我有兩個表Ta和Tb具有不同的行數和列數不同。內容全部是單元格文本，但將來也可能包含單元格編號。

我想這些表的內容合併在一起，下面的以下規則：

取Ta(i,j)的值，如果Tb(i*,j*)是空的，反之亦然。
如果兩者都可用，則取值Ta(i,j)（並且可選地檢查它們是否相同）。

棘手的部分但是，我們沒有唯一的行鍵，我們只有唯一的列鍵。請注意，我對i*和i進行了區分。原因是Ta中的行可能與Tb的索引不同，對於列j*和j也是如此。其含義如下：

我們首先需要確定Ta的哪一行對應於Tb的行，反之亦然。我們可以通過嘗試交叉匹配表共享的任何列來做到這一點。但是，我們可能找不到匹配項（在這種情況下，我們不會將一行與另一行合併）。

問題

我們如何合併這兩個表的內容一起以最有效的方式是什麼？

這裏有一些資源來解釋更詳細的問題：

Ta = cell2table({... 'a1', 'b1', 'c1'; ... 'a2', 'b2', 'c2'}, ... 'VariableNames', {'A','B', 'C'}) Tb = cell2table({... 'b2*', 'c2', 'd2'; ... 'b3', 'c3', 'd3'; ... 'b4', 'c4', 'd4'}, ... 'VariableNames', {'B','C', 'D'})

結果表鍀應該是這樣的：

1 Matlab的例子玩這個：

Tc = cell2table({... 'a1' 'b1' 'c1' ''; ... 'a2' 'b2' 'c2' 'd2'; ... '' 'b3' 'c3' 'd3'; ... '' 'b4' 'c4' 'd4'}, ... 'VariableNames', {'A', 'B','C', 'D'})

2.可能的第一步

我試過如下：

Tc = outerjoin(Ta, Tb, 'MergeKeys', true)

其中一期工程順利，但問題是，它缺乏，似乎類似的行的堆疊。例如。上面的命令產生：

A B C D ____ _____ ____ ____ '' 'b2*' 'c2' 'd2' '' 'b3' 'c3' 'd3' '' 'b4' 'c4' 'd4' 'a1' 'b1' 'c1' '' 'a2' 'b2' 'c2' ''

這裏的行

'' 'b2*' 'c2' 'd2' 'a2' 'b2' 'c2' ''

本來應該合併爲一個：

'a2' 'b2' 'c2' 'd2'

所以我們還需要一步堆棧這兩個一起？

3的一道坎

的例如，如果我們有這樣的事：

Ta = A B C ____ _____ ____ 'a1' 'b1' 'c1' 'a2' 'b2' 'c2' Tb = A B C ____ _____ ____ 'a1' 'b2' 'c3'

然後出現的問題是否在B行應與第1行或第2行合併一個或所有行應合併或只是作爲一個單獨的行？關於如何處理這些類型的情況的想法也很好。

來源

2017-10-17 JohnAndrews

這是非常相似的[前一個問題（https：//開頭計算器。 com/questions/46682751 /有效的方法來追加新的數據在matlab與示例代碼）對嗎？ – Wolfie

不是真的，因爲我真的打算如何使用Matlab Table將兩個表連接在一起。它與上一個問題不同，我區分行和列，以及我處理數字數據的位置 - 如果您可以向我展示與上一個問題的聯繫，那將很棒。 – JohnAndrews

還要注意，在這個問題中，沒有唯一的行。它只是行數不同而已。 – JohnAndrews

這是一個概念性的答案，這可以讓你在路上：

定義一個「評分功能」，告訴你每TB的排它有多好於鉭相匹配的行。
用T填充Tc
對於Ta中的每一行，確定與Tb的最佳匹配。如果比賽質量高於您的標準，請將最佳匹配比賽定義爲成功比賽。
如果succesfull找到匹配，「消費」它（使用來自鋱的信息來充實相應行中鍀如有必要）
一直走，直到你到達Ta的結束，一切還沒有從鋱消耗現在可以'附加'到Tc。

有待改進：

在比賽的選擇

注玩弄消費，而不是Tb的鉭，或使用更復雜的啓發式算法來確定消費順序（如計算所有'距離'並基於成本函數優化匹配）。

請注意，如果您在基本解決方案中遇到大量誤匹配的情況，這些改進僅是必不可少的。對比賽質量的定義

注

我會建議你，如果你有4個領域開始非常簡單，這一點，例如，簡單地計算有多少個字段匹配，或者所有非空字段是否匹配。

如果您想進一步探討，請考慮評估值之間的距離（例如mse）或文本距離的距離（例如levensteihn距離）。

來源

2017-10-19 14:01:35

我真的很喜歡這個。特別是得分功能是一個好主意，它可以讓你用它來提高速度。 – JohnAndrews

這是一個試圖完成這項工作的功能。您提供兩個表格，一個用於決定是否合併兩行的閾值以及一個邏輯，用於說明在合併衝突出現時您是否希望從第一個表格獲取值。我沒有爲極端情況下準備，但看到它可以讓你用：

TkeepAll=mergeTables(Tb,Ta,1,true) 
TmergeSome=mergeTables(Tb,Ta,0.25,true) 
TmergeAll=mergeTables(Tb,Ta,-1,true)

這裏是功能：

function Tmerged=mergeTables(Ta,Tb,threshold,preferA) 
%% parameters 
% Ta and Tb are two the two tables to merge 
% threshold=0.25; minimal ratio of identical values in rows for merge. 
% example: you have one row in table A with 3 values, but you only have two 
% values for the same columns in data B. if one of the values is identical 
% and one isn't, you have ratio of 1/2 aka 0.5, which passes a threshold of 
% 0.25 
% preferA=true; which to take when there is merge conflict 
%% see how well rows fit to each other 
% T1 is the table with fewer rows 
if size(Ta,1)<=size(Tb,1) 
    T1=Ta; 
    T2=Tb; 
    prefer1=preferA; 
else 
    T1=Tb; 
    T2=Ta; 
    prefer1=~preferA; 
end 
[commonVar1,commonVar2]=ismember(T1.Properties.VariableNames,... 
    T2.Properties.VariableNames); 
commonVar1=find(commonVar1); 
commonVar2(commonVar2==0)=[]; 
% fit is a table with the size of N rows T1 by M rows T2, with values 
% describing what ratio of identical items between each row in 
% table 1 (shorter) and each row in table 2 (longer), among all not-missing 
% points 
for ii=1:size(T1,1) %rows of T1 
    for jj=1:size(T2,1) 
     fit(ii,jj)=sum(ismember(T1{ii,commonVar1},T2{jj,commonVar2}))/length(commonVar1); 
    end 
end 
%% pair rows according to fit 
% match has two columns, first one has T1 row number and secone one has the 
% matching T2 row number 
unpaired1=true(size(T1,1),1); 
unpaired2=true(size(T2,1),1); 
count=0; 
match=[]; 
maxv=max(fit,[],2); 
[~,order]=sort(maxv,'descend'); 
order=order'; 
for ii=order %1:size(T1,1) 
    [maxv,maxi]=max(fit,[],2); 
    if maxv(ii)>threshold 
     count=count+1; 
     match(count,1)=ii; 
     match(count,2)=maxi(ii); 
     unpaired1(ii)=false; 
     unpaired2(match(count,2))=false; 
     fit(:,match(count,2))=nan; %exclude paired row from next pairing 
    end 
end 

%% prepare new variables 
% first variables common to the two tables 
Nrows=sum(unpaired1)+sum(unpaired2)+size(match,1); 
namesCommon={}; 
namesCommon(1:length(commonVar1))={T1.Properties.VariableNames{commonVar1}}; 
for vari=1:length(commonVar1) 
    if isempty(match) 
     mergedData={}; 
    else 
     if prefer1 
      mergedData=T1{match(:,1),commonVar1(vari)}; %#ok<*NASGU> 
     else 
      mergedData=T2{match(:,2),commonVar2(vari)}; 
     end 
    end 
    data1=T1{unpaired1,commonVar1(vari)}; 
    data2=T2{unpaired2,commonVar2(vari)}; 
    eval([namesCommon{vari},'=[data1;mergedData;data2];']); 
end 
% variables only in 1 
uncommonVar1=1:size(T1,2); 
uncommonVar1(commonVar1)=[]; 
names1={}; 
names1(1:length(uncommonVar1))={T1.Properties.VariableNames{uncommonVar1}}; 
for vari=1:length(uncommonVar1) 
    data1=T1{:,uncommonVar1(vari)}; 
    tmp=repmat({''},Nrows-size(data1,1),1); 
    eval([names1{vari},'=[data1;tmp];']); 
end 
% variables only in 2 
uncommonVar2=1:size(T2,2); 
uncommonVar2(commonVar2)=[]; 
names2={}; 
names2(1:length(uncommonVar2))={T2.Properties.VariableNames{uncommonVar2}}; 
for vari=1:length(uncommonVar2) 
    data2=T2{:,uncommonVar2(vari)}; 
    tmp=repmat({''},Nrows-size(data2,1),1); 
    eval([names2{vari},'=[tmp;data2];']); 
end 
%% collect variables to a table 
names=sort([namesCommon,names1,names2]); 
str='table('; 
for vari=1:length(names) 
    str=[str,names{vari},',']; 
end 
str=[str(1:end-1),');']; 
Tmerged=eval(str);

來源

2017-10-23 17:53:45

合併兩個表格的內容（尋找Matlab或僞代碼）

回答

注玩弄消費，而不是Tb的鉭，或使用更復雜的啓發式算法來確定消費順序（如計算所有'距離'並基於成本函數優化匹配）。 請注意，如果您在基本解決方案中遇到大量誤匹配的情況，這些改進僅是必不可少的。對比賽質量的定義

注

相關問題