2012-07-27 95 views
7

我有如下列。每列有兩對,分別帶有後綴「a」和「b」 - 例如col1a,col1b,colNa,colNb等,直到文件結尾(> 50000)。合併(分裂的對面)對r行

mydataf <- data.frame (Ind = 1:5, col1a = sample (c(1:3), 5, replace = T), 
    col1b = sample (c(1:3), 5, replace = T), colNa = sample (c(1:3), 5, replace = T), 
    colNb = sample (c(1:3),5, replace = T), 
    K_a = sample (c("A", "B"),5, replace = T), 
    K_b = sample (c("A", "B"),5, replace = T)) 

mydataf 
    Ind col1a col1b colNa colNb K_a K_b 
1 1  1  1  2  3 B A 
2 2  1  3  2  2 B B 
3 3  2  1  1  1 B B 
4 4  3  1  1  3 A B 
5 5  1  1  3  2 B A 

除第一列(工業),我要崩潰了對行,使數據幀如下所示,在在Sametime後綴「a」和「b」被刪除。同時合併字符或數字進行排序1首先是2,第一個比乙

Ind col1 colN K_ 
    1 11  23 AB 
    2 13  22 BB 
    3 12  11 BB 
    4 13  13 AB 
    5 11  23 AB 

編輯:grep的功能(可能)在回答有問題,如果列的名稱相似。

mydataf <- data.frame (col_1_a = sample (c(1:3), 5, replace = T), 
    col_1_b = sample (c(1:3), 5, replace = T), col_1_Na = sample (c(1:3), 5, replace = T), 
    col_1_Nb = sample (c(1:3),5, replace = T), 
    K_a = sample (c("A", "B"),5, replace = T), 
    K_b = sample (c("A", "B"),5, replace = T)) 
n <- names(mydataf) 
nm <- c(unique(substr(n, 1, nchar(n)-1))) 
df <- data.frame(sapply(nm, function(x){ 
          idx <- grep(x, n) 
          cols <- mydataf[idx] 
          x <- apply(cols, 1, 
             function(z) paste(sort(z), collapse = "")) 
          return(x) 
          })) 
names(df) <- nm 
df 

col_1_ col_1_N K_ 
1 2233  23 BB 
2 2233  22 BB 
3 1123  13 AB 
4 1223  12 AB 
5 2333  33 AB 

回答

5
mydataf 
    Ind col1a col1b colNa colNb K_a K_b 
1 1  2  1  1  1 A A 
2 2  1  2  1  3 B A 
3 3  1  2  3  2 A A 
4 4  1  2  3  1 A B 
5 5  1  2  2  1 A A 
n <- names(mydataf) 
nm <- c("Ind", unique(substr(n, 1, nchar(n)-1)[-1])) 
df <- data.frame(sapply(nm, function(x){ 
          idx <- grep(paste0(x, "[ab]?$"), n) 
          cols <- mydataf[idx] 
          x <- apply(cols, 1, 
             function(z) paste(sort(z), collapse = "")) 
          return(x) 
          })) 
names(df) <- nm 
df 
    Ind col1 colN K_ 
1 1 12 11 AA 
2 2 12 13 AB 
3 3 12 23 AA 
4 4 12 13 AB 
5 5 12 12 AA 
+0

謝謝你的解決方案,但是該功能,不過貌似有問題,如果變量的名字都差不多看最近編輯.. – shNIL 2012-07-28 16:57:04

+0

@sharnil,我取代'x'用'paste0 (x,「[ab]?$」)''在'grep'中。現在它需要列名以a或b結尾,或者沒有它們(對於「Ind」情況)。如果沒有「Ind」列,您可以刪除'?'。 – Julius 2012-07-28 17:11:39