2013-02-26 93 views
1

我有一個看起來像中提取數據,並插入到表

   ML1 ML1 SD  ML2 ML2 SD ... 
aPhysics0 0.8730469 0.3329205 0.5950521 0.4908820 
aPhysics1 0.8471074 0.3598839 0.6473829 0.4777848 
aPhysics2 0.8593750 0.3476343 0.7031250 0.4568810 
aPhysics3 0.8875000 0.3159806 0.7000000 0.4582576 
aPhysics4 0.7962963 0.4027512 0.7654321 0.4237285 
... 

數據幀和我想使用該行的名稱創建一個看起來像

 Institution Subject Class  ML1 ML1 SD  ML2 ML2 SD ... 
[1,]   A Physics  0 0.8730469 0.3329205 0.5950521 0.4908820 
[2,]   A Physics  1 0.8471074 0.3598839 0.6473829 0.4777848 
[3,]   A Physics  2 0.8593750 0.3476343 0.7031250 0.4568810 
[4,]   A Physics  3 0.8875000 0.3159806 0.7000000 0.4582576 
[5,]   A Physics  4 0.7962963 0.4027512 0.7654321 0.4237285 
... 

數據幀什麼是最好的方法來做到這一點?

回答

3

假設你data.frame爲df,

header <- as.data.frame(do.call(rbind, strsplit(gsub("Physics", " Physics ", 
       rownames(df)), " "))) 
names(header) <- c("Institution", "Subject", "Class") 
cbind(header, df) 
df.out <- cbind(header, df) 
df.out$Institution <- toupper(df.out$Institution) 

如果您已經多名受試者(廣義解):

header <- as.data.frame(do.call(rbind, strsplit(gsub("^([a-z])(.*)([0-9])$", 
       "\\1 \\2 \\3", rownames(df)), " "))) 
names(header) <- c("Institution", "Subject", "Class") 
df.out <- cbind(header, df) 
df.out$Institution <- toupper(df.out$Institution) 
+0

完美,謝謝! – bountiful 2013-02-26 15:28:06

+0

注意:廣義的解決方案是假定只有一個數字的類號。爲了使它更加靈活,你需要讓你的中間正則表達式模式變得懶惰,並且讓最後一個正則表達式模式變得貪婪。像這樣:'^(\\ w)(\\ w +?)(\\ d +)$'如果您還想要允許多個機構字母,您需要根據大寫字母進行過濾,如下所示:假設主題始終以大寫字母開頭,則可以將'^ [az]'更改爲''[az] +?'(\\ d +)$' – Dinre 2013-02-26 15:50:09

+0

@Dinre,^[az] {0,n}'這裏n是你期望的最大長度。同樣在末尾'[0-9] + $'。這應該足夠了。 – Arun 2013-02-26 16:27:18

3

假設行名稱的形式爲(1個小寫煤焦字符串-1的數字),你可以使用一些正則表達式與gsub

#test data 
x <- data.frame(ML1=runif(5),ML2=runif(5),row.names=paste0("aPhysics",1:5)) 

#logic 
transform(x, Institution=toupper(gsub("^([a-z])([a-zA-Z]+)([0-9])$","\\1",rownames(x))), Subject=gsub("^([a-z])([a-zA-Z]+)([0-9])$","\\2",rownames(x)), Class=gsub("^([a-z])([a-zA-Z]+)([0-9])$","\\3",rownames(x))) 
       ML1  ML2 Institution Subject Class 
aPhysics1 0.51680701 0.4102757   A Physics  1 
aPhysics2 0.60388358 0.7438400   A Physics  2 
aPhysics3 0.26504243 0.7598557   A Physics  3 
aPhysics4 0.55900273 0.5263205   A Physics  4 
aPhysics5 0.05589591 0.7903568   A Physics  5