2016-11-29 186 views
1

我有一個季度的非平衡面板數據是這樣的:如何計算R中不平衡面板數據的逐年平均值?

Firm Date Var_1    
AAA 19701130 24.46    
AAA 197NA    
AAA 19710131 NA    
AAA 19710228 34.19325     
AAA 19710331 NA    
AAA 19710430 NA    
AAA 19710531 29.0235    
AAA 19710630 NA    
AAA 19710731 NA    
AAA 19710831 16.256875    
AAA 19710930 NA    
AAA 19711031 NA    
AAA 19711130 17.22125     
AAA 19711231 NA    
BBB 19730630 4.57     
BBB 19730731 NA    
BBB 19730831 NA    
BBB 19730930 8.736    
BBB 19731031 NA    
BBB 19731130 NA    
BBB 19731231 4.97     
BBB 19740131 NA    
BBB 19740228 NA    
BBB 19740331 6.85125    
BBB 19740430 NA    
BBB 19740531 NA    
BBB 19740630 6.87225    
BBB 19740731 NA    
BBB 19740831 NA    
BBB 19740930 5.454875     
BBB 19741031 NA    
BBB 19741130 NA    
BBB 19741231 4.56875    
BBB 19750131 NA    
BBB 19750228 NA    
BBB 19750331 6.276    
BBB 19750430 NA    
BBB 19750531 NA    
BBB 19750630 6.0145    
BBB 19750731 NA    
BBB 19750831 NA    
BBB 19750930 8.376    
BBB 19751031 NA    
BBB 19751130 NA    
BBB 19751231 9.17875    

實際數據繼續行數萬。這裏要指出的是,每個公司在不同的月末報告。我如何計算每個公司每年Var_1的平均值?最終的結果應該是年份而不是季度。理想的結果將如下所示

Firm Date Var_1 
AAA  1970 24.46 
AAA  1971 24.17 
BBB  1973 6.09  
BBB  1974 5.94  
BBB  1975 7.46  

回答

0

我們可以使用其中一個功能組。在「公司」和「日期」的子字符串分組後,獲得'Var_1'的mean

library(dplyr) 
df1 %>% 
    group_by(Firm, Date = substr(Date, 1,4)) %>% 
    summarise(Var_1 = round(mean(Var_1, na.rm = TRUE), 2)) 
# Firm Date Var_1 
# <chr> <chr> <dbl> 
#1 AAA 1970 24.46 
#2 AAA 1971 24.17 
#3 BBB 1973 6.09 
#4 BBB 1974 5.94 
#5 BBB 1975 7.46 

或者與aggregatebase R

aggregate(Var_1~., transform(df1, Date = substr(Date, 1, 4)), FUN = mean, na.rm = TRUE)