2016-03-03 68 views
4

我有一個數據,如下所示:合併(使平均值)與部分匹配的頭名列

AAA_1 AAA_2 AAA_3 BBB_1 BBB_2 BBB_3 CCC 
1 1  1  1  2  2  2  1 
2 3  1  4  0  0  0  0 
3 5  3  0  1  1  1  1 

對於每一行,我想打一個均值其中有一個共同的特點如下

那些列
feature <- c("AAA","BBB","CCC") 

所需的輸出應該是這樣的:

AAA BBB CCC 
1 1  2 1 
2 2.6  0 0 
3 2.6  1 1 

每個模式單獨我能做到這一點:

data <- read.table("data.txt",header=T,row.name=1) 
AAA <- as.matrix(rowMeans(data[ , grepl("AAA" , names(data)) ]) 

但我不知道該怎麼辦部分在一排匹配不同的模式

也嘗試了一些其他的東西,如:

for (i in 1:length(features)){ 
feature[i] <- as.matrix(rowMeans(data[ , grepl(feature[i] , names(data)) ])) 
} 
+0

能否請您讓您的例子可以重現?此外,請閱讀[本](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Sotos

回答

1
library(dplyr) 
library(tidyr) 
data %>% 
    add_rownames() %>% 
    gather("variable", "value", -rowname) %>% 
    mutate(variable = gsub("_.*$", "", variable)) %>% 
    group_by(rowname, variable) %>% 
    summarise(mean = mean(value)) %>% 
    spread(variable, mean) 
2

這是給你另一種選擇。看到你的列模式,我選擇使用gsub()並獲得前三個字母。使用包括AAA,BBB和CCC的ind,我使用lapply(),爲每個ind元素的數據設置子集,計算出的行的意思是,只提取行平均值的列。然後,我使用bind_cols()並創建foo。最後一件事是將列名分配給foo。

library(dplyr) 

ind <- unique(gsub("_\\d+$", "", names(mydf))) 

lapply(ind, function(x){ 
    select(mydf, contains(x)) %>% 
    transmute(out = rowMeans(.)) 
    }) %>% 
bind_cols() %>% 
add_rownames -> foo 

names(foo) <- ind 

#  AAA BBB CCC 
#  (dbl) (dbl) (dbl) 
#1 1.000000  2  1 
#2 2.666667  0  0 
#3 2.666667  1  1 

DATA

mydf <- structure(list(AAA_1 = c(1L, 3L, 5L), AAA_2 = c(1L, 1L, 3L), 
AAA_3 = c(1L, 4L, 0L), BBB_1 = c(2L, 0L, 1L), BBB_2 = c(2L, 
0L, 1L), BBB_3 = c(2L, 0L, 1L), CCC = c(1L, 0L, 1L)), .Names = c("AAA_1", 
"AAA_2", "AAA_3", "BBB_1", "BBB_2", "BBB_3", "CCC"), class = "data.frame", row.names = c(NA, 
-3L)) 
+0

相同的想法概念,不同的執行:) – Sotos

+0

@Sotos是的,我們似乎在同一時間以同樣的方式工作。 :) – jazzurro

+1

偉大的思想...... :) – Sotos

2

假設你如你的榜樣,那麼可以拆分的名稱和總colnames總是結構。

new_names <- unlist(strsplit(names(df),"\\_.*")) 
colnames(df) <- new_names 
#Testing with your data, we need to prevent the loss of dimension by using drop = FALSE 
sapply(unique(new_names), function(i) rowMeans(df[, new_names==i, drop = FALSE])) 
#   AAA BBB CCC 
#[1,] 1.000000 2 1 
#[2,] 2.666667 0 0 
#[3,] 2.666667 1 1 

數據:

df <- structure(list(AAA_1 = c(1L, 3L, 5L), AAA_2 = c(1L, 1L, 3L), 
AAA_3 = c(1L, 4L, 0L), BBB_1 = c(2L, 0L, 1L), BBB_2 = c(2L, 
0L, 1L), BBB_3 = c(2L, 0L, 1L), CCC = c(1L, 0L, 1L)), .Names = c("AAA_1", 
"AAA_2", "AAA_3", "BBB_1", "BBB_2", "BBB_3", "CCC"), class = "data.frame", row.names = c(NA, 
-3L))