2015-11-03 51 views
1

我有一個如下的數據集。重估來自多列的屬性

dat1 <- read.table(header=TRUE, text=" 
ID Pa Gu Ta 
8645 Rel345 Gel294 Tel452 
6228 Rel345 Gel294 Tel467 
5830 Rel345 Gel294 Tel467 
1844 Rel345 Gel295 Tel467 
4461 Rel345 Gel295 Tel467 
2119 Rel345 Gel294 Tel452 
1821 Rel345 Gel294 Tel467 
6851 Rel345 Gel294 Tel467 
4214 Rel345 Gel294 Tel452 
2589 Rel346 Gel294 Tel467 
2116 Rel347 Gel294 Tel452 
8523 Rel348 Gel295 Tel468 
2603 Rel348 Gel295 Tel468 
2801 Rel348 Gel295 Tel452 
1485 Rel348 Gel295 Tel468 
2116 Rel348 Gel295 Tel452 
8753 Rel348 Gel295 Tel452 
4277 Rel348 Gel295 Tel468 
7053 Rel348 Gel295 Tel468 
3320 Rel348 Gel295 Tel452 
7974 Rel348 Gel295 Tel468 
        ") 
dat1 
    ID  Pa  Gu  Ta 
1 8645 Rel_123 Gela_134 Tel_111 
2 6228 Rel_123 Gela_134 Tel_112 
3 5830 Rel_123 Gela_134 Tel_112 
4 1844 Rel_123 Gela_135 Tel_112 
5 4461 Rel_123 Gela_135 Tel_112 
6 2119 Rel_123 Gela_134 Tel_111 
7 1821 Rel_123 Gela_134 Tel_112 
8 6851 Rel_123 Gela_134 Tel_112 
9 4214 Rel_123 Gela_134 Tel_111 
10 2589 Rel_124 Gela_134 Tel_112 
11 2116 Rel_125 Gela_134 Tel_111 
12 8523 Rel_126 Gela_135 Tel_113 
13 2603 Rel_126 Gela_135 Tel_113 
14 2801 Rel_126 Gela_135 Tel_111 
15 1485 Rel_126 Gela_135 Tel_113 
16 2116 Rel_126 Gela_135 Tel_111 
17 8753 Rel_126 Gela_135 Tel_111 
18 4277 Rel_126 Gela_135 Tel_113 
19 7053 Rel_126 Gela_135 Tel_113 
20 3320 Rel_126 Gela_135 Tel_111 
21 7974 Rel_126 Gela_135 Tel_113 

右三列的屬性重新編碼就像folllowing:

dat2 <- read.table(header=TRUE, text=" 
Att New_Att 
Rel345 Rel_123 
Rel346 Rel_124 
Rel347 Rel_125 
Rel348 Rel_126 
Gel294 Gela_134 
Gel295 Gela_135 
Tel452 Tel_111 
Tel467 Tel_112 
Tel468 Tel_113 

        ") 
dat2 
    Att New_Att 
1 Rel345 Rel_123 
2 Rel346 Rel_124 
3 Rel347 Rel_125 
4 Rel348 Rel_126 
5 Gel294 Gela_134 
6 Gel295 Gela_135 
7 Tel452 Tel_111 
8 Tel467 Tel_112 
9 Tel468 Tel_113 

使用plyr包(通過使用revalue功能),我可以像下面的變化。

library(plyr) 
dat1$Pa<- revalue(dat1$Pa, c("Rel345"="Rel_123","Rel346"="Rel_124","Rel347"="Rel_125", 
"Rel348"="Rel_126")) 
dat1$Gu<- revalue(dat1$Gu, c("Gel294"="Gela_134","Gel295"="Gela_135")) 
dat1$Ta<- revalue(dat1$Ta, c("Tel452"="Tel_111","Tel467"="Tel_112","Tel468"="Tel_113")) 

dat1 
    ID  Pa  Gu  Ta 
1 8645 Rel_123 Gela_134 Tel_111 
2 6228 Rel_123 Gela_134 Tel_112 
3 5830 Rel_123 Gela_134 Tel_112 
4 1844 Rel_123 Gela_135 Tel_112 
5 4461 Rel_123 Gela_135 Tel_112 
6 2119 Rel_123 Gela_134 Tel_111 
7 1821 Rel_123 Gela_134 Tel_112 
8 6851 Rel_123 Gela_134 Tel_112 
9 4214 Rel_123 Gela_134 Tel_111 
10 2589 Rel_124 Gela_134 Tel_112 
11 2116 Rel_125 Gela_134 Tel_111 
12 8523 Rel_126 Gela_135 Tel_113 
13 2603 Rel_126 Gela_135 Tel_113 
14 2801 Rel_126 Gela_135 Tel_111 
15 1485 Rel_126 Gela_135 Tel_113 
16 2116 Rel_126 Gela_135 Tel_111 
17 8753 Rel_126 Gela_135 Tel_111 
18 4277 Rel_126 Gela_135 Tel_113 
19 7053 Rel_126 Gela_135 Tel_113 
20 3320 Rel_126 Gela_135 Tel_111 
21 7974 Rel_126 Gela_135 Tel_113 

我有一個100萬行的數據集,其中一些變量有200多個類別。所以我上面的代碼不方便。我想通過閱讀dat2中的重新編碼來更改attribute name

回答

1

我們通過「DAT1」的列環除了「ID」列中,match的「ATT」從「DF2」,使用數字指數以「New_Att」

相應元素來替換列元素
dat1[-1] <- lapply(dat1[-1], function(x) dat2$New_Att[match(x, dat2$Att)]) 

或者我們可以像以前一樣將數據集轉換爲矩陣和match

`dim<-`(dat2[,2][match(as.matrix(dat1[-1]), dat2[,1])], dim(dat1[-1]))