由元數據（素食包）着色稀疏曲線（phyloseq包）

第一次問題提交在這裏。我無法在其他帖子中找到這個問題的答案（愛堆棧交換，順便說一句）。由元數據（素食包）着色稀疏曲線（phyloseq包）

無論如何... 我通過素食包創建了一個稀疏曲線，並且我得到了一個非常混亂的情節，在情節底部有一個非常粗的黑色條，遮蔽了一些低多樣性樣本行。理想情況下，我想用我所有的線條（169;我可以將其減少到144）生成一個情節，但製作一個複合圖形，按樣本年繪製顏色，併爲每個池塘製作不同類型的線條（即：2個樣本年：2016年，2017年和3個池塘：1,2,5）。我使用phyloseq創建了一個包含我所有數據的對象，然後將我的元數據表與我的元數據分離爲不同的對象（jt = OTU表和sampledata =元數據）。我當前的代碼：

jt <- as.data.frame(t(j)) # transform it to make it compatible with the proceeding commands 
rarecurve(jt 
      , step = 100 
      , sample = 6000 
      , main = "Alpha Rarefaction Curve" 
      , cex = 0.2 
      , color = sampledata$PondYear) 

# A very small subset of the sample metadata 
        Pond Year 
F16.5.d.1.1.R2  5  2016 
F17.1.D.6.1.R1  1  2017 
F16.1.D15.1.R3  1  2016 
F17.2.D00.1.R2  2  2017

enter image description here

來源

2017-11-11 Samaj Oruel

下面是如何繪製與ggplot一個稀疏曲線的例子。我使用了bioconductor提供的phyloseq包裝中的數據。

安裝phyloseq：

source('http://bioconductor.org/biocLite.R') 
biocLite('phyloseq') 
library(phyloseq)

其他庫需要

library(tidyverse) 
library(vegan)

數據：

mothlist <- system.file("extdata", "esophagus.fn.list.gz", package = "phyloseq") 
mothgroup <- system.file("extdata", "esophagus.good.groups.gz", package = "phyloseq") 
mothtree <- system.file("extdata", "esophagus.tree.gz", package = "phyloseq") 
cutoff <- "0.10" 
esophman <- import_mothur(mothlist, mothgroup, mothtree, cutoff)

提取OTU表，轉置，並轉換爲數據幀

otu <- otu_table(esophman) 
otu <- as.data.frame(t(otu)) 
sample_names <- rownames(otu) 

out <- rarecurve(otu, step = 5, sample = 6000, label = T)

現在你有一個列表中的每個元素對應一個樣本：

清洗名單了一下：

rare <- lapply(out, function(x){ 
    b <- as.data.frame(x) 
    b <- data.frame(OTU = b[,1], raw.read = rownames(b)) 
    b$raw.read <- as.numeric(gsub("N", "", b$raw.read)) 
    return(b) 
})

標籤列表

names(rare) <- sample_names

轉換成數據幀：

rare <- map_dfr(rare, function(x){ 
    z <- data.frame(x) 
    return(z) 
}, .id = "sample")

讓我們看看它是怎麼樣的：

head(rare) 
    sample  OTU raw.read 
1  B 1.000000  1 
2  B 5.977595  6 
3  B 10.919090  11 
4  B 15.826125  16 
5  B 20.700279  21 
6  B 25.543070  26

與GGPLOT2情節

ggplot(data = rare)+ 
    geom_line(aes(x = raw.read, y = OTU, color = sample))+ 
    scale_x_continuous(labels = scales::scientific_format())

素食主義者陰謀：

rarecurve(otu, step = 5, sample = 6000, label = T) #low step size because of low abundance

一個可以使一個附加列根據這個分組和顏色。

以下是如何添加另一個分組的示例。讓我們假設你有如下形式的表：

groupings <- data.frame(sample = c("B", "C", "D"), 
         location = c("one", "one", "two"), stringsAsFactors = F) 
groupings 
    sample location 
1  B  one 
2  C  one 
3  D  two

其中樣本根據另一特徵分組。您可以使用lapply或map_dfr越過groupings$sample和標籤rare$location。

rare <- map_dfr(groupings$sample, function(x){ #loop over samples 
    z <- rare[rare$sample == x,] #subset rare according to sample 
    loc <- groupings$location[groupings$sample == x] #subset groupings according to sample, if more than one grouping repeat for all 
    z <- data.frame(z, loc) #make a new data frame with the subsets 
    return(z) 
}) 

head(rare) 
    sample  OTU raw.read loc 
1  B 1.000000  1 one 
2  B 5.977595  6 one 
3  B 10.919090  11 one 
4  B 15.826125  16 one 
5  B 20.700279  21 one 
6  B 25.543070  26 one

讓我們做一個像樣的情節了這個

ggplot(data = rare)+ 
    geom_line(aes(x = raw.read, y = OTU, group = sample, color = loc))+ 
    geom_text(data = rare %>% #here we need coordinates of the labels 
       group_by(sample) %>% #first group by samples 
       summarise(max_OTU = max(OTU), #find max OTU 
         max_raw = max(raw.read)), #find max raw read 
       aes(x = max_raw, y = max_OTU, label = sample), check_overlap = T, hjust = 0)+ 
    scale_x_continuous(labels = scales::scientific_format())+ 
    theme_bw()

來源

2017-11-11 08:33:23 missuse

哇！這很棘手！現在我明白爲什麼人們認爲R很痛苦使用。沒有一個簡單的方法來做到這一點？ –

@Jari Oksanen：來自'vegan'的'rerecurve'函數有'col'和'lty'的參數，可以指定這些參數，如果'ggplot'沒有打算，可以手動添加一個圖例。 – missuse

由元數據（素食包）着色稀疏曲線（phyloseq包）

回答

相關問題