使用R - 將多個色譜柱冷凝成新色譜柱而不重複內容

我是一位植物學家，也是初學者的R用戶。我想知道你是否可以幫我找到寫劇本的解決方案。我一直在使用R來優化從電子表格創建文本的過程。爲此我使用MonographaR包，我很好。問題本身正在處理data.frame。我的電子表格（CSV文件）基本上由物種欄，字符行和交叉點單元格組成。我想要一個最終腳本，它允許我將兩個或更多列合併到原始電子表格的新列中。當細胞具有不同的內容時，新的細胞內容必須通過昏迷+空間", "分開獨立的內容。當單元格具有相同的內容時，新單元格必須只有相同的內容一次，而不重複它。我試圖用連接編寫的腳本，cbind等重複了單元格的內容，我對此並不滿意。使用R - 將多個色譜柱冷凝成新色譜柱而不重複內容

我最初的CSV看起來像這樣，

 cattleya.minor cattleya.maxima cattleya.pumila 
colour red   red    red 
surface sharp   smooth   sharp 
leaves 1    3    4

，我想有一個最終的結果是這樣

 cattleya  cattleya.minor cattleya.maxima cattleya.pumila 
colour red   red   red    red 
surface sharp, smooth sharp   smooth   sharp 
leaves 1, 3, 4  1    3    4

非常感謝你確實。

來源

2016-08-12 T. M.

你的數據不是[整潔（http://vita.had.co.nz/papers/tidy-data.pdf），因爲你已經得到了不同類型的數據（字符串，整數）在同一列內。轉換數據會更好，因此每一列都是一個變量，每一行都是一個觀察值。 – alistaire

As @alistaire評論說，從「整潔」數據開始，事情會變得更容易。

# Starting data (which I've called "dat") 
dat

 cattleya.minor cattleya.maxima cattleya.pumila 
colour    red    red    red 
surface   sharp   smooth   sharp 
leaves    1    3    4

library(reshape2) 
library(tibble) 
library(dplyr) 

# Make data tidy 
dat.tidy = dat %>% 
    rownames_to_column(var="Characteristic") %>%    # Turn rownames into a data column 
    melt(id.var="Characteristic", variable.name="Species") %>% # Reshape to "long" format 
    dcast(Species ~ Characteristic)        # Cast back to wide so that each characteristic gets its own column 

dat.tidy

  Species colour leaves surface 
1 cattleya.minor red  1 sharp 
2 cattleya.maxima red  3 smooth 
3 cattleya.pumila red  4 sharp

# Summarize by genus 
dat.tidy %>% 
    group_by(Genus=gsub("(.*)\\..*","\\1",Species)) %>%  # Collapse to genus (remove species designation) 
    summarise_all(funs(paste(unique(.), collapse=", "))) %>% # For each charactreristic, paste together each unique value for a given genus 
    select(-Species)

 Genus colour leaves  surface 
1 cattleya red 1, 3, 4 sharp, smooth

來源

2016-08-12 02:32:20 eipi10

謝謝@allistaire & @ eipi10！

Eipi10，我很高興能接近我的目標。我完全按照您的建議和相同的數據集運行腳本。它工作得很好，但它看起來在最後一個命令塊或在線select(-Species)上有一點問題。你會檢查它嗎？ [R取回我下面的：

> dat <- read.csv("dat.csv") 
> dat 
     cattleya.minor cattleya.maxima cattleya.pumila 
color    red    red    red 
surface   sharp   smooth   sharp 
leaves    1    3    4 
> 
> # Make data tidy 
> dat.tidy = dat %>% 
+ rownames_to_column(var="Characteristic") %>%    # Turn  rownames into a data column 
+ melt(id.var="Characteristic", variable.name="Species") %>% # Reshape to "long" format 
+ dcast(Species ~ Characteristic)        # Cast back to wide so that each characteristic gets its own column 
Warning message: 
attributes are not identical across measure variables; they will be dropped 
> 
> dat.tidy 
      Species color leaves surface 
1 cattleya.minor red  1 sharp 
2 cattleya.maxima red  3 smooth 
3 cattleya.pumila red  4 sharp 
> 
> # Summarize by genus 
> dat.tidy %>% 
+ group_by(Genus=gsub("(.*)\\..*","\\1",Species)) %>% # Collapse to genus (remove species designation) 
+ summarise_all(funs(paste(unique(.), collapse=", "))) # For each charactreristic, paste together each unique value for a given genus 
# A tibble: 1 x 5 
    Genus           Species color leaves   surface 
    <chr>           <chr> <chr> <chr>   <chr> 
1 cattleya cattleya.minor, cattleya.maxima, cattleya.pumila red 1, 3, 4 sharp, smooth 
> select(-Species) 
Error in select_(.data, .dots = lazyeval::lazy_dots(...)) : 
    objeto 'Species' não encontrado (my free translation: object 'Species' not found) 
>

來源

2016-08-12 16:56:35

這是因爲我在編輯我的答案時，在選擇（ - 種類）之前意外刪除了'％>％'行。對於那個很抱歉。我現在修好了。如果沒有前一行中的'％>％'，R會將'select（-Species）'作爲單獨的語句處理，因此會導致錯誤。 'select（-Species）'只是刪除'Species'列，但如果你想在彙總輸出中保留'Species'列，你可以刪除那一行。 – eipi10

夢幻般的解決方案！非常感謝你。 –

使用R - 將多個色譜柱冷凝成新色譜柱而不重複內容

回答

相關問題