2016-10-03 48 views
0

我正在處理一些奇怪格式的調查數據(由其他人收集並記錄)。它記錄了調查斷面上的物種丰度,但它只列出了在給定樣帶中觀察到的物種,並沒有記錄所有可能的物種。我花了一些時間弄清楚如何使用tidyr重新塑造數據,以便在每次調查期間爲每個物種設置一個列,而沒有記錄的物種填充0。這裏是一個簡短的,可重複的例子:在tidyr中加上具有重複標識符的行:: spread

#This works: 
Survey <- as.factor(c(rep("Survey 1",10),rep("Survey 2",10),rep("Survey 3",10))) 
Species <- as.factor(c(c("A","B","C","D","E","U","V","W","X","Y"),c("A","C","E","G","I","K","M","O","Q","S"),c("B","D","F","H","J","L","N","P","R","T"))) 
Abundance <- ceiling(runif(30,1,50)) 

working.df<-cbind.data.frame(Survey,Species,Abundance) 

working.spread<-working.df %>% 
    group_by(Survey) %>% 
    spread(Species,Abundance,drop=F,fill=0) 

不幸的是,真正的數據並非這麼簡單。在某些情況下,他們在一次調查中記錄了同一物種的多行,以便他們可以記錄我不感興趣的其他變量的信息。我只關心每次調查的總丰度。因此,這是真正的數據可能看起來像一個例子(注意雙「A」在Species2開始):

#This doesn't work:  
Species2 <- as.factor(c(c("A","A","C","D","E","U","V","W","X","Y"),c("A","C","E","G","I","K","M","O","Q","S"),c("B","D","F","H","J","L","N","P","R","T"))) 

not.working.df<-cbind.data.frame(Survey,Species2,Abundance) 

not.working.spread<-not.working.df %>% 
    group_by(Survey) %>% 
    spread(Species2,Abundance,drop=F,fill=0) 

所以,當有兩個同種的上市,價差說法沒有較長的作品,並返回熟悉的錯誤:

Error: Duplicate identifiers for rows (1, 2) 

而在真實數據集我得到了不少這些重複的錯誤(這只是幾個數據集之一),所以我不希望當然要經過並手動修復:

Error: Duplicate identifiers for rows (206, 216), (1532, 1544), (1052, 1595), (1324, 1330), (191, 212), (194, 211), (1392, 1600), (19, 37), (1404, 1599), (199, 215), (1073, 1596), (1074, 1597), (43, 44, 45), (455, 456), (380, 381, 382, 383), (447, 448), (413, 414, 415, 416, 417, 418), (303, 304), (1015, 1016), (897, 898, 1593), (1306, 1307), (1041, 1594), (1076, 1598), (1425, 1426), (49, 64), (198, 214) 

我想要做的是在重複標識符之間總結丰度字段。我知道這裏有類似的問題,並且我對其中的許多人都有所瞭解,但是我還沒有找到解決方案。我一直在努力做到這一點與傳播,它似乎是我一個簡單的函數命令遠離這個工作...任何意見將不勝感激。或者如果我完全錯過了對這個問題的現有答案,請指出我的方向。

乾杯

+0

聽起來就像你需要在擴散前總結數據集。 [這個答案](http://stackoverflow.com/a/35228491/2461552)作爲一個很好的解釋過程。 – aosmith

+0

謝謝,這樣做!以下解決方案 – stewart6

回答

1

感謝,艾歐史密斯,指着我的總結線程該訣竅的方向。這裏的工作解決方案:

not.working.spread<-not.working.df %>% 
    group_by(Survey,Species2) %>% 
    summarize(Abundance = sum(Abundance)) %>% 
    spread(Species2,Abundance,drop=F,fill=0)