2015-07-10 165 views
0

我有幾個具有大致相同結構的data.frames。對於一個可重複的例子,我創建了兩個樣本數據框df1df2從data.frame中的行中提取data.frame列名稱

df1 <- structure(list(sample = c(2L, 6L), data1 = c(56L, 78L), data2 = c(59L, 
27L), data6 = c(90L, 28L), data1namet = structure(c(1L, 1L), .Label = "Sam1", class = "factor"), 
    data2namab = structure(c(1L, 1L), .Label = "Test2", class = "factor"), 
    dataame = structure(c(1L, 1L), .Label = "Ex3", class = "factor")), .Names = c("sample", 
"data1", "data2", "data3", "data1namet", "data2namab", "dataame" 
), class = "data.frame", row.names = c(NA, -2L)) 

df1 
    sample data1 data2 data3 data1namet data2namab dataame 
1  2 56 59 90  Sam1  Test2  Ex3 
2  6 78 27 28  Sam1  Test2  Ex3 

df2 <- structure(list(sample = c(12L, 13L, 17L), data1 = c(56L, 78L, 
3L), data2 = c(59L, 27L, 2L), datest = structure(c(1L, 1L, 
1L), .Label = "Exa9", class = "factor"), dattestr = structure(c(1L, 
1L, 1L), .Label = "cz1", class = "factor")), .Names = c("sample", 
"data1", "data2", "datest", "dattestr"), class = "data.frame", row.names = c(NA, 
-3L)) 

df2 
    sample data1 data2 datest dattestr 
1  12 56 59  Exa9  cz1 
2  13 78 27  Exa9  cz1 
3  17 3 2  Exa9  cz1 

數據的名稱的數據列後,保存在列,我想知道是否有一種方法,我可以重組data.frames(約40 data.frames),它們所包含的名字他們的列名中的數據?

df1 
    sample Sam1 Test2 Ex3 
1  2 56 59 90 
2  6 78 27 28 

df2 
    sample Exa9 cz1 
1  12 56 59 
2  13 78 27 
3  17 3 2 

編輯

正如我才意識到我也有數據列後等欄目,使我的輸入數據看起來像這樣

df1 <- structure(list(sample = c(2L, 6L), data1 = c(56L, 78L), data2 = c(59L, 
27L), data3 = c(90L, 28L), data1namet = structure(c(1L, 1L), .Label = "Sam1", class = "factor"), 
    data2namab = structure(c(1L, 1L), .Label = "Test2", class = "factor"), 
    dataame = structure(c(1L, 1L), .Label = "Ex3", class = "factor"), 
    ma = c("Jay", "Jay")), .Names = c("sample", "data1", "data2", 
"data3", "data1namet", "data2namab", "dataame", "ma"), row.names = c(NA, 
-2L), class = "data.frame") 

df1 
sample data1 data2 data3 data1namet data2namab dataame ma 
1  2 56 59 90  Sam1  Test2  Ex3 Jay 
2  6 78 27 28  Sam1  Test2  Ex3 Jay 

df2 <- structure(list(sample = c(12L, 13L, 17L), data1 = c(56L, 78L, 
3L), data2 = c(59L, 27L, 2L), datest = structure(c(1L, 1L, 1L 
), .Label = "Exa9", class = "factor"), dattestr = structure(c(1L, 
1L, 1L), .Label = "cz1", class = "factor"), add = c(2, 2, 2)), .Names = c("sample", 
"data1", "data2", "datest", "dattestr", "add"), row.names = c(NA, 
-3L), class = "data.frame") 

df2 
sample data1 data2 datest dattestr add 
1  12 56 59 Exa9  cz1 2 
2  13 78 27 Exa9  cz1 2 
3  17  3  2 Exa9  cz1 2 

在這種情況下,maadd列不是數據的一部分,應該在結束這樣被添加:

df1 
    sample Sam1 Test2 Ex3 ma 
1  2 56 59 90 Jay 
2  6 78 27 28 Jay 

df2 
    sample Exa9 cz1 add 
1  12 56 59 2 
2  13 78 27 2 
3  17 3 2 2 
+0

我是否正確理解您只想刪除某些列,同時保留數據框的其餘部分,包括列名?差不多是 – RHertel

+0

。我想要做的是從將被刪除的列中提取這些列名稱(包含數據列的「名稱」的列) – nebuloso

回答

1

人們可以開始通過識別哪些列應保持:

keep_col <- which(sapply(df2, is.numeric)) 

之後,需要做一些工作來提取新列名並重命名數據框中的相應列:

names <- df2[1,keep_col[-1] + length(keep_col)-1] 
colnames(df2)[keep_col[-1]] <- as.character(unlist(names)) 

最後,數據幀可以通過僅保留所需的列進行重組:

df2 <- df2[,keep_col] 
#> df2 
# sample Exa9 cz1 
#1  12 56 59 
#2  13 78 27 
#3  17 3 2 

爲了使用這種轉化爲幾種不同的dataframes,代碼可以被包裝成一個函數:

summarize_table <- function(x){ 
keep_col <- which(sapply(x, is.numeric)) 
names <- x[1,keep_col[-1] + length(keep_col)-1] 
colnames(x)[keep_col[-1]] <- as.character(unlist(names)) 
x <- x[,keep_col] 
} 

如果各種數據幀存儲在列表中,函數summarize_table()可與lapply()一起使用以獲得每個數據幀的結果:

my_dfs <- list(df1,df2) 
out <- lapply(my_dfs,summarize_table) 
#> out 
#[[1]] 
# sample Sam1 Test2 Ex3 
#1  2 56 59 90 
#2  6 78 27 28 
# 
#[[2]] 
# sample Exa9 cz1 
#1  12 56 59 
#2  13 78 27 
#3  17 3 2 

編輯/附錄

下面的修改後的版本應該能也處理在修訂後提到的情況:

summarize_tab2 <- function(x){ 
keep_col <- which(sapply(x, is.numeric)) 
first_block <- c(keep_col[1],keep_col[which(diff(keep_col)==1)]) 
add_col <- FALSE 
if (2 * (length(keep_col) - 1) + 1 < ncol(x)) add_col <- TRUE 
keep_col1 <- keep_col[1:length(first_block)] 
names <- x[1,keep_col1[-1] + length(keep_col1) - 1] 
colnames(x)[keep_col1[-1]] <- as.character(unlist(names)) 
df_t <- x[,keep_col] 
if (add_col) df_t <- cbind(df_t, x[(2 * (ncol(df_t) - 1) + 2):ncol(x)]) 
return(df_t) 
} 
my_dfs <- list(df1, df2, df3, df4) 
out <- lapply(my_dfs, summarize_tab2) 
#> out 
#[[1]] 
# sample Sam1 Test2 Ex3 ma 
#1  2 56 59 90 Jay 
#2  6 78 27 28 Jay 
# 
#[[2]] 
# sample Exa9 cz1 add 
#1  12 56 59 2 
#2  13 78 27 2 
#3  17 3 2 2 
# 
#[[3]] 
# sample Sam1 Test2 Ex3 
#1  2 56 59 90 
#2  6 78 27 28 
# 
#[[4]] 
# sample Exa9 cz1 
#1  12 56 59 
#2  13 78 27 
#3  17 3 2 

這裏dataframes df3df4分別是,原始帖子的數據幀df1df2

+0

這看起來很棒,謝謝!但是我有大約40個不同的數據框架。我怎麼能將這個應用於所有的40個data.frames? – nebuloso

+0

謝謝你的努力!然而我只是意識到我在'data'和'dataname'部分之後有更多的列(我用示例數據編輯了問題以說明我的意思)。你能否快速瀏覽@RHertel?謝謝。 – nebuloso

1

下面應該工作:

library(plyr) 

cols.to.rename <- grep('^data(.)$', colnames(df1)) 
cols.of.names <- max(cols.to.rename)+seq(1,length(cols.to.rename)) 
the.names <- lapply(df1[1,cols.of.names], as.character) 

df1.mod <- df1 
colnames(df1.mod)[cols.to.rename] <- the.names 
df1.mod <- df1.mod[-cols.of.names] 

它重命名所有數據X列列中的最後一列DATAX以下(第一)值。然後它從數據框中刪除所有名稱列。

+0

感謝您的回答@Christoph Sommer!然而,據我瞭解'grep'搜索相同的模式。在我的數據中,列的名稱並不總是相同的,所以這不起作用。它從data.frame更改爲data.frame,我有大約40個data.frames,我想改變它。這從我身邊沒有明確的表達,我很抱歉(我改變了我的樣本數據,使其更清楚)。 – nebuloso

+0

我明白了。這留下了兩件事情非常不清楚:首先,在你的表格中,如何告訴包含來自包含名稱的列的數據的列?其次,你怎麼知道哪個屬於哪個? –

+0

我設法組織數據的方式是以數字方式對數據列進行排序。因此,第一個df的data1,data2和data3是'數據列',然後是3'數據名稱列'。我希望這更清楚? – nebuloso