2014-09-19 66 views
0

我想用一些代碼將數據框的列添加到我的數據框中,即使它們包含NA(我認爲它排除了很多可能性)。按照NA的分類添加平均列到數據框

我能做到的最好的是:

TestData <- data.frame(geo=c(rep("AT",4),rep("DE",4)),time=c(rep(c(1990:1993),2)),value=c(NA,4,20,6,NA,NA,5,3)) 

mean <- aggregate(value~geo, TestData, mean) 

其中按類別(GEO)計算正確的手段。我怎樣才能讓他們加入到數據框中,使得平均值不僅僅是一個觀察值,而是在每個時間點顯示出來?我正在考慮ddply,但無法正常工作。我正在尋找的數據幀是:

geo time value mean (or optionally, no problem for me) 
1 AT 1990 NA  10  NA 
2 AT 1991 4  10  10 
3 AT 1992 20  10  10 
4 AT 1993 6  10  10 
5 DE 1990 NA  4  NA 
6 DE 1991 NA  4  NA 
7 DE 1992 5  4  4 
8 DE 1993 3  4  4 

任何幫助,將不勝感激!

回答

1

嘗試:

testData1 <- within(TestData, { 
        Mean <- ave(value, geo, FUN=function(x) mean(x, na.rm=TRUE)) 
        Mean[is.na(value)] <- NA}) #If you don't want `NA` values don't use this step 


    testData1 
    # geo time value Mean 
#1 AT 1990 NA NA 
#2 AT 1991  4 10 
#3 AT 1992 20 10 
#4 AT 1993  6 10 
#5 DE 1990 NA NA 
#6 DE 1991 NA NA 
#7 DE 1992  5 4 
#8 DE 1993  3 4 

如果你想找到的多個列的是starts與名valuemean

例如:

TestData1 <- TestData 
TestData1$value2 <- c(4, NA, 25, NA, NA, 10,5, 2) 


library(dplyr) 

res <- left_join(TestData1, 
       TestData1 %>% 
         group_by(geo) %>% 
         mutate_each(funs(mean=mean(., na.rm=TRUE)), starts_with("value")), 
          by=c("geo", "time")) 


colnames(res) <- gsub("\\.y$", ".mean", colnames(res)) 
res 
# geo time value.x value2.x value.mean value2.mean 
#1 AT 1990  NA  4   10 14.500000 
#2 AT 1991  4  NA   10 14.500000 
#3 AT 1992  20  25   10 14.500000 
#4 AT 1993  6  NA   10 14.500000 
#5 DE 1990  NA  NA   4 5.666667 
#6 DE 1991  NA  10   4 5.666667 
#7 DE 1992  5  5   4 5.666667 
#8 DE 1993  3  2   4 5.666667 
+0

正是我一直在尋找,謝謝非常! :-) – 2014-09-19 13:20:46