將因子映射到數據幀

我的採樣數據分佈在兩個數據集上。 loc描述了地理位置，spe包含找到的物種。 Unfortunally，採樣站由兩個因素（cruise和station）所描述的，所以我需要構造唯一標識符用於兩個數據集將因子映射到數據幀

>loc 
    cruise station  lon lat 
1 TY1  A1 53.8073 6.7836 
2 TY1  3 53.7757 6.7009 
3 AZ7  A1 53.7764 6.6758

和

>spe 
    cruise station  species abundance 
1 TY1  A1 Ensis ensis  100 
2 TY1  A1 Magelona   5 
3 TY1  A1 Nemertea  17 
4 TY1  3 Magelona   8 
5 TY1  3  Ophelia  1200 
6 AZ7  A1  Ophelia  950 
7 AZ7  A1 Ensis ensis  89 
8 AZ7  A1  Spio   1

我需要的是增加一個獨特標識符ID作爲這樣

cruise station  species abundance  ID 
1 TY1  A1 Ensis ensis  100 STA0001 
2 TY1  A1 Magelona   5 STA0001 
3 TY1  A1 Nemertea  17 STA0001 
4 TY1  3 Magelona   8 STA0002 
5 TY1  3  Ophelia  1200 STA0002 
6 AZ7  A1  Ophelia  950 STA0003 
7 AZ7  A1 Ensis ensis  89 STA0003 
8 AZ7  A1  Spio   1 STA0003

這裏的數據

loc<-data.frame(cruise=c("TY1","TY1","AZ7"),station=c("A1",3,"A1"),lon=c(53.8073, 53.7757, 53.7764),lat=c(6.7836, 6.7009, 6.6758)) 

spe<-data.frame(cruise=c(rep("TY1",5),rep("AZ7",3)),station=c(rep("A1",3),rep(3,2),rep("A1",3)),species=c("Ensis ensis", "Magelona", "Nemertea", "Magelona", "Ophelia", "Ophelia","Ensis ensis", "Spio"),abundance=c(100,5,17,8,1200,950,89,1))

然後，我構建ID爲loc

loc$ID<-paste("STA",formatC(1:nrow(loc),width=4,format="d",flag="0"),sep="")

但我如何映射到IDspe？

我發現的方式涉及到兩個嵌套循環，對於像我這樣的程序編程人員來說是相當英俊的（如果嵌套的循環可以稱爲英俊的話）。我很確定R中的雙線程會更高效，更快速，但我無法弄清楚。我真的想在我的代碼中擁有更多的美感，這是非常不錯的。

來源

2012-07-13 Janhoo

+1歡迎來到StackOverflow。我希望所有的新問題都能清楚地看到，包括樣本數據，預期結果和工作代碼！ – Andrie 2012-07-13 15:32:53

其實，我覺得這就是merge在基礎R只是工作的情況下：

merge(spe, loc, all.x=TRUE) 

    cruise station  species abundance  lon lat 
1 AZ7  A1  Ophelia  950 53.7764 6.6758 
2 AZ7  A1 Ensis ensis  89 53.7764 6.6758 
3 AZ7  A1  Spio   1 53.7764 6.6758 
4 TY1  3 Magelona   8 53.7757 6.7009 
5 TY1  3  Ophelia  1200 53.7757 6.7009 
6 TY1  A1 Ensis ensis  100 53.8073 6.7836 
7 TY1  A1 Magelona   5 53.8073 6.7836 
8 TY1  A1 Nemertea  17 53.8073 6.7836

要查找唯一標識符，使用unique()：

unique(paste(loc$cruise, loc$station, sep="-")) 
[1] "TY1-A1" "TY1-3" "AZ7-A1"

來源

2012-07-13 15:23:05 Andrie

這就是要走的路。謝啦！ – Janhoo 2012-07-13 15:27:10

但我仍然需要唯一的標識符。不應該那麼強硬 - 我會嘗試。 – Janhoo 2012-07-13 15:41:02

@sunpyg你可以使用'獨特'和'粘貼' - 我編輯我的答案。 – Andrie 2012-07-13 15:48:00

您可以結合因素與interaction。

如果您對ID列的標籤不感興趣，解決方案非常簡單。

loc <- within(loc, id <- interaction(cruise, station)) 
spe <- within(spe, id <- interaction(cruise, station))

來源

2012-07-13 15:27:38

爲了表明這個地方因而導致（可能會感興趣）：

唯一標識ID添加到loc如前所述。

loc$ID<-paste("STA", formatC(1:nrow(loc), width=4, format="d", flag="0"), sep="")

提議Andrie merge(spe, loc, all.x=TRUE)結合data.frames根據需要，消除了可能沒有對應的spe（如果這些應該被保留使用merge(spe, loc, all.x=TRUE, all.y=TRUE)代替。

我想要一個表的loc所有元素每個站的所有物種的丰度，這是由

as.data.frame.matrix(xtabs(abundance ~ ID + species, merge(spe, loc, all.x=T))) 
     Ensis ensis Magelona Nemertea Ophelia Spio 
STA0001   100  5  17  0 0 
STA0002   0  8  0 1200 0 
STA0003   89  0  0  950 1

由於Andrie和棉先生

012取得並轉換成數據幀的

來源

2012-07-16 09:27:10 Janhoo

將因子映射到數據幀

回答

相關問題