2015-11-05 81 views
1

我正在做一個聚類分析,並且想計算修剪樹葉中某個變量的出現次數。下面是一個簡化的例子,其中被修剪的樹只有三個分支。我現在想知道三個不同分支/葉子中的As和Bs的數量。我怎樣才能得到這些?計算修剪樹狀圖葉中特定元素的數量

rm(list=ls(all=TRUE)) 
mylabels  <- matrix(nrow=1, ncol = 20) 
mylabels[1,1:10] <- ("A") 
mylabels[1,11:20] <- ("B") 
myclusterdata <- matrix(rexp(100, rate=.1), ncol=100, nrow=20) 

rownames(myclusterdata)<-mylabels 
hc <- hclust(dist(myclusterdata), "ave") 
memb <- cutree(hc, k = 3) 
cent <- NULL 
for(k in 1:3){ 
    cent <- rbind(cent, colMeans(myclusterdata[memb == k, , drop = FALSE])) 
} 

hc1 <- hclust(dist(cent)^2, method = "cen", members = table(memb)) 
# whole tree 
plot(as.dendrogram(hc),horiz=T) 
# pruned tree (only 3 branches) 
plot(as.dendrogram(hc1),horiz=T) 

回答

0

好的我想通了。葉子的元素在膜中...因此重新排列它們並組合它提供了結果。以下是示例代碼

rm(list=ls(all=TRUE)) 
mylabels  <- matrix(nrow=1, ncol = 20) 
mylabels[1,1:10] <- ("A") 
mylabels[1,11:20] <- ("B") 
myclusterdata <- matrix(rexp(100, rate=.1), ncol=100, nrow=20) 

rownames(myclusterdata)<-mylabels 
hc <- hclust(dist(myclusterdata), "ave") 
memb <- cutree(hc, k = 3) 

cent <- NULL 
for(k in 1:3){ 
    cent <- rbind(cent, colMeans(myclusterdata[memb == k, , drop = FALSE])) 
} 

hc1 <- hclust(dist(cent)^2, method = "cen", members = table(memb)) 
# whole tree 
plot(as.dendrogram(hc),horiz=T) 
# pruned tree (only 3 branches) 
plot(as.dendrogram(hc1),horiz=T) 

# identify the percentages of A and B 
var_of_interest <- levels(as.factor(names(memb))) 
leaf_number <- levels(as.factor(memb)) 

counter <- matrix(nrow=length(leaf_number), ncol = length(var_of_interest)) 
for (i in seq(1:length(leaf_number))) { 
    for (j in seq(1:length(var_of_interest))) { 
     counter[i,j] <- length(memb[names(memb)==var_of_interest[j] & memb==leaf_number[i]]) 
    } 
} 
counter[,2]/(counter[,1]+counter[,2])