2013-02-11 64 views
3

我正在處理大型數據集,下面顯示了一個示例。對於大多數我需要處理的個人文件,應該有一天以上的數據。子集數據框由大部分日常記錄組成

Date <- c("05/12/2012 05:00:00", "05/12/2012 06:00:00", "05/12/2012 07:00:00", 
      "05/12/2012 08:00:00", "06/12/2012 07:00:00", "06/12/2012 08:00:00", 
      "07/12/2012 05:00:00", "07/12/2012 06:00:00", "07/12/2012 07:00:00", 
      "07/12/2012 08:00:00") 
Date <- strptime(Date, "%d/%m/%Y %H:%M") 
c <- c("0","1","5","4","6","8","0","3","10","6") 
c <- as.numeric(c) 
df1 <- data.frame(Date,c,stringsAsFactors = FALSE) 

我希望只剩下一天的數據。這一天將通過獲得當天數據點數最多的方式來選擇。如果由於任何原因連續兩天(以最大數據點數),我希望選擇記錄最高單個值的那一天。

在上面給出的示例數據框中,我將留下12月7日。它具有4個數據點(如同12月5日那樣),但是它具有記錄在這兩天(即10天)以外的最高值。

回答

4

下面是與tapply的解決方案。

# count rows per day and find maximum c value 
res <- with(df1, tapply(c, as.Date(Date), function(x) c(length(x), max(x)))) 

# order these two values in decreasing order and find the associated day 
# (at top position): 
maxDate <- names(res)[order(sapply(res, "[", 1), 
          sapply(res, "[", 2), decreasing = TRUE)[1]] 

# subset data frame: 
subset(df1, as.character(as.Date(Date)) %in% maxDate) 

        Date c 
7 2012-12-07 05:00:00 0 
8 2012-12-07 06:00:00 3 
9 2012-12-07 07:00:00 10 
10 2012-12-07 08:00:00 6 
4

一個data.table解決方案:

dt <- data.table(df1) 
# get just the date 
dt[, day := as.Date(Date)] 
setkey(dt, "day") 
# get total entries (N) and max(c) for each day-group 
dt <- dt[, `:=`(N = .N, mc = max(c)), by=day] 
setkey(dt, "N") 
# filter by maximum of N 
dt <- dt[J(max(N))] 
setkey(dt, "mc") 
# settle ties with maximum of c 
dt <- dt[J(max(mc))] 
dt[, c("N", "mc", "day") := NULL] 
print(dt) 

#     Date c 
# 1: 2012-12-07 05:00:00 0 
# 2: 2012-12-07 06:00:00 3 
# 3: 2012-12-07 07:00:00 10 
# 4: 2012-12-07 08:00:00 6 
3

而且是完整的,這裏有一個與plyr

library(plyr) 
df1$day <- strftime(df1$Date, "%d/%m/%Y") 
tmp <- ddply(df1[,c("day","c")], .(day), summarize, nb=length(c), max=max(c)) 
tmp <- tmp[order(tmp$nb, tmp$max, decreasing=TRUE),] 
df1[df1$day==tmp$day[1],] 

其中給出:

    Date c  day 
7 2012-12-07 05:00:00 0 07/12/2012 
8 2012-12-07 06:00:00 3 07/12/2012 
9 2012-12-07 07:00:00 10 07/12/2012 
10 2012-12-07 08:00:00 6 07/12/2012