我確信這個答案已經到位，但我不認爲我一直在使用正確的搜索條件。根據行和列名稱添加矩陣

這是我的問題。我有多個矩陣（我將在這裏簡化爲兩個），其中每行是一個唯一標記的個體（其中一些在矩陣之間共享，其中一些不是）以及共享的共同列標題。

例如：

first<-matrix(rbinom(20,1,.5),4,5) 
first[,1]=c(122,145,186,199) 
colnames(first)<-c("ID",901,902,903,904) 
first 
     ID 901 902 903 904 
[1,] 122 1 0 0 0 
[2,] 145 0 0 0 1 
[3,] 186 0 0 1 1 
[4,] 199 1 0 0 0 

second<-matrix(rbinom(30,1,.5),6,5) 
second[,1]=c(122,133,142,151,186,199) 
colnames(second)<-c("ID",901,902,903,904) 
second 
     ID 901 902 903 904 
[1,] 122 0 1 1 1 
[2,] 133 0 0 0 1 
[3,] 142 1 1 0 1 
[4,] 151 0 1 0 0 
[5,] 186 1 0 1 1 
[6,] 199 1 0 0 0

我想補充「第一」和「第二」一起基於「ID」和列名。這應該導致一個有7行的矩陣（因爲'第一'矩陣中有4個ID，'第二'矩陣中有3個新ID和3箇舊ID：「122,133,142,145,151,186,199」）和相同的列數。

在這個例子中，結果我想應該是：

 ID 901 902 903 904 
[1,] 122 1 1 1 1 
[2,] 133 0 0 0 1 
[3,] 142 1 1 0 1 
[4,] 145 0 0 0 1 
[5,] 151 0 1 0 0 
[6,] 186 1 0 2 2 
[7,] 199 2 0 0 0

來源

2012-07-20 user1399311

我一直在尋找一個解決方案，而一個「爲」使用內建函數沒有成功循環。因此，這裏是我的方法

set.seed(1) # make it reproducible 
first <- matrix(rbinom(20,1,.5),4,5) 
first[ ,1] <- c(122, 145, 186, 199) 
colnames(first) <- c("ID", 901, 902, 903, 904) 

second <- matrix(rbinom(30, 1, .5), 6, 5) 
second[ ,1] <- c(122, 133, 142, 151, 186, 199) 
colnames(second) <- c("ID", 901, 902, 903, 904) 

first 

     ID 901 902 903 904 
[1,] 122 0 1 1 1 
[2,] 145 1 0 0 1 
[3,] 186 1 0 1 0 
[4,] 199 1 0 0 1 

second 
     ID 901 902 903 904 
[1,] 122 0 0 1 1 
[2,] 133 0 0 0 1 
[3,] 142 1 1 1 0 
[4,] 151 0 1 1 0 
[5,] 186 0 1 1 1 
[6,] 199 1 0 1 1 

## stack them rowise 
mat <- rbind(first, second) 

ind <- unique(mat[,"ID"]) 

result <- matrix(nrow = length(ind), ncol = 5) 
result[,1] <- ind 

for (i in seq_along(ind)) { 
    result[i,-1] <- colSums(mat[mat[ ,"ID"] == ind[i], -1, drop = FALSE]) 
} 
colnames(result) <- colnames(mat) 

result 
     ID 901 902 903 904 
[1,] 122 0 1 2 2 
[2,] 145 1 0 0 1 
[3,] 186 1 1 2 1 
[4,] 199 2 0 1 2 
[5,] 133 0 0 0 1 
[6,] 142 1 1 1 0 
[7,] 151 0 1 1 0

來源

2012-07-20 19:41:40 dickoa

我設置您的問題略有不同：

first <- matrix(rbinom(16,1,.5),4,4) 
rownames(first) <- c(122,145,186,199) 
colnames(first) <- c(901,902,903,904) 

second <- matrix(rbinom(24,1,.5),6,4) 
rownames(second) <- c(122,133,142,151,186,199) 
colnames(second) <- c(901,902,903,904)

矩陣現在有一個名爲rownames

> first 
    901 902 903 904 
122 1 0 0 1 
145 1 0 0 0 
186 0 0 1 1 
199 1 0 1 1 
> second 
    901 902 903 904 
122 1 1 0 0 
133 0 0 1 1 
142 1 0 1 0 
151 1 0 1 1 
186 0 1 0 1 
199 0 0 0 0

現在很容易對排名進行設置操作：

SumOnID <- function(A, B){ 
    rnA <- rownames(A) 
    rnB <- rownames(B) 

    ls.id <- list(ids = intersect(rnA, rnB), #shared indices 
       idA = setdiff(rnA, rnB), #only in A 
       idB = setdiff(rnB, rnA)) #only in B 

    do.call(rbind, 
    lapply(names(ls.id), function(x){ 
     if (x == "ids") return(A[x,, drop = F] + B[x,, drop = F]) 
     if (x == "idA") return(A[x,, drop = F]) 
     if (x == "idB") return(B[x,, drop = F]) 
    })) 
}

讓我們試試吧：

> SumOnID(first, second) 
    901 902 903 904 
122 2 1 1 1 
186 1 1 0 1 
199 2 1 1 0 
145 1 1 0 1 
133 1 0 1 1 
142 1 0 1 0 
151 1 1 1 1

來源

2012-07-20 19:38:27 Ryogi

原來的答覆

大廈從@RYogi在您使用rownames和colnames來形容你的矩陣的方法，我提出以下建議：

res <- rbind(first,second) 
res <- tapply(res, expand.grid(dimnames(res)), sum)

所有具有相同的rownames的行將被總結。

當使用數據幀

如果輸入是data.frame，以上都不行，作爲data.frame不能有任何重複的行名。另一種方法在這裏也適用：

rowsum(rbind(first, second), c(rownames(first), rownames(second)))

這種方法也適用於矩陣。由於它只需要一行，你可能會認爲它更簡單。我想這也可能更有效，因爲它不如tapply。

rowsum(rbind(first, second)[,-1], c(first[,1], second[,1]))

注意，結果仍命名行，包含這些名稱不是列：你可以從你的問題，那裏的標識符在一個單獨的列調整此解決方案中的數據格式。

有趣的是，我偶然讀了rowsum，同時尋找rowSums這個問題的data.frame版本的一個相當複雜的方法。幸運的我。

附加提示

如果發現所產生的名字Var1和Var2的尺寸混淆，您可以使用

names(dimnames(res)) <- NULL

刪除它們。如果你的數據真的是你所描述的格式，與行可以使用以下命令將它們更改爲適當的行名稱：

rownames(first) <- first[,1] 
first <- first[,-1]

來源

2012-07-20 22:30:09 MvG

'expand.grid'的工作原理與魔術類似。 – Ryogi 2012-07-20 23:16:00

我不知道爲什麼，但是當我在我的真實數據集上使用rbind時（我使用ID作爲rownames），重複的rownames在它們的末尾附加了一個數字。例如，如果ID＃165320128出現3次，一行將是'165320128'，下一個'1653201281'和最後一個'1653201282' – user1399311 2012-07-22 18:01:55

@ user1399311，它可能是您的原始數據存儲在數據幀而不是矩陣？看起來它們表現出您描述的行爲，因爲data.frame不允許重複的行名稱。你可以將它們轉換成矩陣，但我會編輯我的答案以提供更好的解決方案。 – MvG 2012-07-22 18:24:51

根據行和列名稱添加矩陣

回答

原來的答覆

當使用數據幀

附加提示

相關問題