2017-08-04 149 views
0

儘管我發現了相當類似的問題,但我仍然努力使用ggplot2,但我並沒有設法實現它。我想按列重新排序,並按照分層聚類排列熱圖。ggplot2基於分層聚類重新編制熱圖

這裏我實際的代碼:

# import 
library("ggplot2") 
library("scales") 
library("reshape2") 

# data loading 
data_frame = read.csv(file=input_file, header=TRUE, row.names=1, sep='\t') 

# clustering with hclust on row and on column 
dd.col <- as.dendrogram(hclust(dist(data_frame))) 
dd.row <- as.dendrogram(hclust(dist(t(data_frame)))) 

# ordering based on clustering 
col.ord <- order.dendrogram(dd.col) 
row.ord <- order.dendrogram(dd.row) 


# making a new data frame reordered 
new_df = as.data.frame(data_frame[col.ord, row.ord]) 
print(new_df) # when mannualy looking new_df it seems working 

# get the row name 
name = as.factor(row.names(new_df)) 

# reshape 
melte_df = melt(cbind(name, new_df)) 

# the solution is here to reorder the name column factors levels. 
melte_df$name = factor(melte_df$name, levels = row.names(data_frame)[as.vector(row.ord)]) 

# ggplot2 dark magic 
(p <- ggplot(melte_df, aes(variable, name)) + geom_tile(aes(fill = value), 
colour = "white") + scale_fill_gradient(low = "white", 
high = "steelblue") + theme(text=element_text(size=12), 
axis.text.y=element_text(size=3))) 

# save fig 
ggsave(file = "test.pdf") 

# result is ordered as only by column what I have missed? 

我有R相當牛逼,如果你可以開發你的答案,你會受到歡迎。

回答

1

沒有一個例子集再現,我不是100%肯定這是原因,但我猜想,你的問題依賴於該行:

name = as.factor(row.names(new_df)) 

當您使用的一個因素,排序是基於該因素水平的排序。您可以根據需要對數據框進行重新排序,繪圖時使用的順序將成爲關卡的順序。

下面是一個例子:

data_frame <- data.frame(x = c("apple", "banana", "peach"), y = c(50, 30, 70)) 
data_frame 
     x y 
1 apple 50 
2 banana 30 
3 peach 70 

data_frame$x <- as.factor(data_frame$x) # Make x column a factor 

levels(data_frame$x) # This shows the levels of your factor 
[1] "apple" "banana" "peach" 

data_frame <- data_frame[order(data_frame$y),] # Order by value of y 
data_frame 
    x y 
2 banana 30 
1 apple 50 
3 peach 70 

# Now let's plot it: 
p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y)) 
p 

這是結果:

example-result

看到了嗎?它不是按照我們想要的y值排序的。它按照因素的等級排序。現在,如果問題確實存在,那麼在這裏有解決方案R - Order a factor based on value in one or more other columns

應用實例與dplyr的解決方案:

library(dplyr) 
data_frame <- data_frame %>% 
     arrange(y) %>%   # sort your dataframe 
     mutate(x = factor(x,x)) # reset your factor-column based on that order 

data_frame 
     x y 
1 banana 30 
2 apple 50 
3 peach 70 

levels(data_frame$x) # Levels of the factor are reordered! 
[1] "banana" "apple" "peach" 

p <- ggplot(data_frame, aes(x)) + geom_bar(aes(weight=y)) 
p 

這是現在的結果是:

enter image description here

我希望這可以幫助,否則,你可能想給的例子你的原始數據集!

+0

你的答案真正有用的地方指出問題。但最終我找到了一個更方便的方法。通過重新排列因素水平。我將編輯我的問題,添加使其工作的原因,但再次感謝您的幫助。 –