2011-02-10 159 views
6

從1951年到2007年,我在數據框架中按國家劃分了平衡面板。我想將其轉換爲其他變量的五年平均值的新數據框。當我坐下來這樣做時,我意識到我唯一能想到的方法就是使用for循環,然後決定是時候來求助於幫助。R:計算面板數據的5年平均數

那麼,有沒有一種簡單的辦法把它們看起來像這樣的數據:

country country.isocode year  POP   ci  grgdpch 
Argentina    ARG 1951 17517.34 18.445022145 3.4602044759 
Argentina    ARG 1952 17876.96 17.76066507 -7.887407586 
Argentina    ARG 1953 18230.82 18.365255769 2.3118720688 
Argentina    ARG 1954 18580.56 16.982113434 1.5693778844 
Argentina    ARG 1955 18927.82 17.488907008 5.3690276523 
Argentina    ARG 1956 19271.51 15.907756547 0.3125559183 
Argentina    ARG 1957 19610.54 17.028450999 2.4896639667 
Argentina    ARG 1958 19946.54 17.541597134 5.0025894968 
Argentina    ARG 1959 20281.15 16.137310492 -6.763501447 
Argentina    ARG 1960 20616.01 20.519539628 8.481742144 
... 
Venezuela    VEN 1997 22361.80 21.923577413 5.603872759 
Venezuela    VEN 1998 22751.36 24.451736863 -0.781844721 
Venezuela    VEN 1999 23128.64 21.585034168 -8.728234466 
Venezuela    VEN 2000 23492.75 20.224310777 2.6828641218 
Venezuela    VEN 2001 23843.87 23.480311721 0.2476965412 
Venezuela    VEN 2002 24191.77 16.290691319 -8.02535946 
Venezuela    VEN 2003 24545.43 10.972153646 -8.341989049 
Venezuela    VEN 2004 24904.62 17.147693312 14.644028806 
Venezuela    VEN 2005 25269.18 18.805970212 7.3156977879 
Venezuela    VEN 2006 25641.46 22.191098769 5.2737381326 
Venezuela    VEN 2007 26023.53 26.518210052 4.1367897561 

弄成這個樣子:

country country.isocode period AvPOP  Avci Avgrgdpch 
Argentina    ARG  1 18230 17.38474 1.423454 
... 
Venezuela    VEN  12 25274 21.45343 5.454334 

我是否需要使用特定的面板來改變這個數據幀數據包?還是有另一種簡單的方法來做到這一點,我失蹤了?

回答

10

這是東西aggregate用於製成。 :

Df <- data.frame(
    year=rep(1951:1970,2), 
    country=rep(c("Arg","Ven"),each=20), 
    var1 = c(1:20,51:70), 
    var2 = c(20:1,70:51) 
) 

Level <-cut(Df$year,seq(1951,1971,by=5),right=F) 
id <- c("var1","var2") 

> aggregate(Df[id],list(Df$country,Level),mean) 
    Group.1  Group.2 var1 var2 
1  Arg [1951,1956) 3 18 
2  Ven [1951,1956) 53 68 
3  Arg [1956,1961) 8 13 
4  Ven [1956,1961) 58 63 
5  Arg [1961,1966) 13 8 
6  Ven [1961,1966) 63 58 
7  Arg [1966,1971) 18 3 
8  Ven [1966,1971) 68 53 

您可能想要做的唯一的事情就是重命名類別和變量名稱。

3

對於這種類型的問題,plyr軟件包是真正的驚人。下面是一些代碼,它基本上爲您提供了一行代碼以及一個小幫助函數。

library(plyr) 
library(zoo) 
library(pwt) 

# First recreate dataset, using package pwt 
data(pwt6.3) 
pwt <- pwt6.3[ 
     pwt6.3$country %in% c("Argentina", "Venezuela"), 
     c("country", "isocode", "year", "pop", "ci", "rgdpch") 
] 

# Use rollmean() in zoo as basis for defining a rolling 5-period rolling mean 
rollmean5 <- function(x){ 
    rollmean(x, 5) 
} 

# Use ddply() in plyr package to create rolling average per country 
pwt.ma <- ddply(pwt, .(country), numcolwise(rollmean5)) 

這裏是輸出從該:

> head(pwt, 10) 
      country isocode year  pop  ci rgdpch 
ARG-1950 Argentina  ARG 1950 17150.34 13.29214 7736.338 
ARG-1951 Argentina  ARG 1951 17517.34 18.44502 8004.031 
ARG-1952 Argentina  ARG 1952 17876.96 17.76067 7372.721 
ARG-1953 Argentina  ARG 1953 18230.82 18.36526 7543.169 
ARG-1954 Argentina  ARG 1954 18580.56 16.98211 7661.550 
ARG-1955 Argentina  ARG 1955 18927.82 17.48891 8072.900 
ARG-1956 Argentina  ARG 1956 19271.51 15.90776 8098.133 
ARG-1957 Argentina  ARG 1957 19610.54 17.02845 8299.749 
ARG-1958 Argentina  ARG 1958 19946.54 17.54160 8714.951 
ARG-1959 Argentina  ARG 1959 20281.15 16.13731 8125.515 

> head(pwt.ma) 
    country year  pop  ci rgdpch 
1 Argentina 1952 17871.20 16.96904 7663.562 
2 Argentina 1953 18226.70 17.80839 7730.874 
3 Argentina 1954 18577.53 17.30094 7749.694 
4 Argentina 1955 18924.25 17.15450 7935.100 
5 Argentina 1956 19267.39 16.98977 8169.456 
6 Argentina 1957 19607.51 16.82080 8262.250 

注意rollmean(),默認情況下,計算出中心移動平均值。您可以通過將此參數傳遞給輔助函數來修改此行爲以獲得向左或向右移動的均值。

編輯:

@Joris Meys委婉地指出你實際上可能是平均五年期之後。

下面是修改代碼來做到這一點:

pwt$period <- cut(pwt$year, seq(1900, 2100, 5)) 
pwt.ma <- ddply(pwt, .(country, period), numcolwise(mean)) 
pwt.ma 

和輸出:

> pwt.ma 
    country  period year  pop  ci rgdpch 
1 Argentina (1945,1950] 1950.0 17150.336 13.29214 7736.338 
2 Argentina (1950,1955] 1953.0 18226.699 17.80839 7730.874 
3 Argentina (1955,1960] 1958.0 19945.149 17.42693 8410.610 
4 Argentina (1960,1965] 1963.0 21616.623 19.09067 9000.918 
5 Argentina (1965,1970] 1968.0 23273.736 18.89005 10202.665 
6 Argentina (1970,1975] 1973.0 25216.339 19.70203 11348.321 
7 Argentina (1975,1980] 1978.0 27445.430 23.34439 11907.939 
8 Argentina (1980,1985] 1983.0 29774.778 17.58909 10987.538 
9 Argentina (1985,1990] 1988.0 32095.227 15.17531 10313.375 
10 Argentina (1990,1995] 1993.0 34399.829 17.96758 11221.807 
11 Argentina (1995,2000] 1998.0 36512.422 19.03551 12652.849 
12 Argentina (2000,2005] 2003.0 38390.719 15.22084 12308.493 
13 Argentina (2005,2010] 2006.5 39831.625 21.11783 14885.227 
14 Venezuela (1945,1950] 1950.0 5009.006 41.07972 7067.947 
15 Venezuela (1950,1955] 1953.0 5684.009 44.60849 8132.041 
16 Venezuela (1955,1960] 1958.0 6988.078 37.87946 9468.001 
17 Venezuela (1960,1965] 1963.0 8451.073 26.93877 9958.935 
18 Venezuela (1965,1970] 1968.0 10056.910 28.66512 11083.242 
19 Venezuela (1970,1975] 1973.0 11903.185 32.02671 12862.966 
20 Venezuela (1975,1980] 1978.0 13927.882 36.35687 13530.556 
21 Venezuela (1980,1985] 1983.0 16082.694 22.21093 10762.718 
22 Venezuela (1985,1990] 1988.0 18382.964 19.48447 10376.123 
23 Venezuela (1990,1995] 1993.0 20680.645 19.82371 10988.096 
24 Venezuela (1995,2000] 1998.0 22739.062 20.93509 10837.580 
25 Venezuela (2000,2005] 2003.0 24550.973 17.33936 10085.322 
26 Venezuela (2005,2010] 2006.5 25832.495 24.35465 11790.497 
+0

他沒有索要滾動平均值... – 2011-02-10 09:29:51

0

有一個基本統計信息和一個plyr答案,所以爲了完整性,這裏是一個dplyr爲基礎的答案。利用里斯給玩具的數據,我們有

Df <- data.frame(
    year=rep(1951:1970,2), 
    country=rep(c("Arg","Ven"),each=20), 
    var1 = c(1:20,51:70), 
    var2 = c(20:1,70:51) 
) 

現在,使用cut創建期間,我們就可以對他們的羣體,並獲得方式:

Df %>% mutate(period = cut(Df$year,seq(1951,1971,by=5),right=F)) %>% 
group_by(country, period) %>% summarise(V1 = mean(var1), V2 = mean(var2)) 

Source: local data frame [8 x 4] 
Groups: country 

    country  period V1 V2 
1  Arg [1951,1956) 3 18 
2  Arg [1956,1961) 8 13 
3  Arg [1961,1966) 13 8 
4  Arg [1966,1971) 18 3 
5  Ven [1951,1956) 53 68 
6  Ven [1956,1961) 58 63 
7  Ven [1961,1966) 63 58 
8  Ven [1966,1971) 68 53