2017-09-02 56 views
0

我這裏有一個數據幀(flights_delay): enter image description here獲取avg_delay由目的地

在這個數據幀(flights_delay),我有重複的目的地(在 「目標」 欄)。我試圖通過目標(「dest」列)獲得平均延遲(「avg_delay」列)。我曾嘗試這樣的代碼:

sum_avg_delay <- aggregate(avg_delay~dest,flights_delay,sum)$avg_delay 

不幸的是,我得到的數字向量沒有任何目的地標籤。

我也試過dplyr::summarise函數,但是這會返回一個錯誤。

必須有一種更簡單的方法來獲得目的地的平均延遲。

+1

大概'骨料(avg_delay〜DEST,flights_delay,總和)'對於2列data.frame或帶有(flights_delay,tapply(avg_delay,dest,sum))的命名向量。 – lmo

回答

2

你在正確的軌道上,只是簡化:

df <- data.frame(dest=c("IAH","IAH","MIA","BQN","ATL","ATL"), 
      avg_delay=c(13,24,35,-19,-31,8)) 

aggregate(avg_delay ~ dest, sum, data=df) 

    dest avg_delay 
1 ATL  -23 
2 BQN  -19 
3 IAH  37 
4 MIA  35 
2

這裏是一個選項使用dplyr

suppressPackageStartupMessages(library(dplyr)) 

df <- data.frame(dest=c("IAH","IAH","MIA","BQN","ATL","ATL"), 
       avg_delay=c(13,24,35,-19,-31,8)) 

# average delay by destination 
df %>% 
    group_by(dest) %>% 
    summarise(avg_delay = mean(avg_delay)) 
#> # A tibble: 4 x 2 
#>  dest avg_delay 
#> <fctr>  <dbl> 
#> 1 ATL  -11.5 
#> 2 BQN  -19.0 
#> 3 IAH  18.5 
#> 4 MIA  35.0 

# sum of average delay by destination 
df %>% 
    group_by(dest) %>% 
    summarise(avg_delay = sum(avg_delay)) 
#> # A tibble: 4 x 2 
#>  dest avg_delay 
#> <fctr>  <dbl> 
#> 1 ATL  -23 
#> 2 BQN  -19 
#> 3 IAH  37 
#> 4 MIA  35