2016-02-19 81 views
0

我有多個年份收集的x個單元和y個採樣站(每個單元內的多個站)的植被指標的數據框。我想選擇收集數據的最近一年的每個單位的所有植被數據。這裏是我的數據幀的例子:我希望它看起來像這樣按最近年份選擇行

veg cover unit station year 
1 tree 0.97 U1  A1 2015 
2 grass 0.21 U1  A1 2015 
3 tree 0.35 U1  A2 2014 
4 grass 0.67 U1  A2 2014 
5 tree 0.45 U2  A3 2013 
6 grass 0.72 U2  A3 2013 
7 tree 0.27 U2  A4 2014 
8 grass 0.67 U2  A4 2014 

veg cover unit station year 
1 tree 0.97 U1  A1 2015 
2 grass 0.21 U1  A1 2015 
3 tree 0.27 U2  A4 2014 
4 grass 0.67 U2  A4 2014 

任何幫助將是非常

veg <- c("tree","grass","tree","grass","tree","grass","tree","grass") 
cover <- c(0.97,0.21,0.35,0.67,0.45,0.72,0.27,0.67) 
unit <- c("U1","U1","U1","U1","U2","U2","U2","U2") 
station <- c("A1","A1","A2","A2","A3","A3","A4","A4") 
year <- c(2015,2015,2014,2014,2013,2013,2014,2014) 
df <- data.frame(veg,cover,unit,station,year) 

數據幀看起來像這樣讚賞。

+0

爲什麼你最近幾年不想要?你想定義「近年」嗎? – MaxPD

回答

0

這是怎麼做沒有任何包。

df.by  = by(df, df$unit, FUN = function(t) t[t$year == max(t$year),]) 
df.recent = Reduce(function(...) merge(..., all=T), df.by) 
df.recent 

輸出是

>  df.recent 
    veg cover unit station year 
1 grass 0.21 U1  A1 2015 
2 grass 0.67 U2  A4 2014 
3 tree 0.27 U2  A4 2014 
4 tree 0.97 U1  A1 2015 

對於第一行,我們使用函數by由因子df$unit到子集的數據幀。對於每個子集(對於每個單元),我們通過匿名函數function(t) t[t$year == max(t$year),])提取最近一年的行。

df.by是僅包含每個單元的最近一年的行的數據幀的列表。

對於第二行,我們使用merge函數合併df.by中的所有數據幀。此代碼的使用在Simultaneously merge multiple data.frames in a list中解釋。

+0

謝謝你做到了。 – omwrichmond

0

這會得到你的答案,你想最近的veg/unit是否正確?

library(dplyr) 
df %>% 
    group_by(veg, unit) %>% 
    arrange(desc(year)) %>% 
    slice(1)