dplyr操縱橫行分組發生變異

我有數據集dplyr操縱橫行分組發生變異

x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6), 
       Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4), 
       Longitude = c(100, 101, 102, 102, 103, 104), 
       Exposure = c(1, 2, 3, 4, 5, 6))

我試圖操縱裏面的數據x變成

x <- data.frame(Postcode = c(1, 2, 3, 4, 5, 6), 
       Latitude = c(3.1, 3.2, 3.3, 3.3, 3.4, 3.4), 
       Longitude = c(100, 101, 102, 102, 103, 104), 
       Exposure = c(1, 2, 3, 4, 5, 6), 
       coords = c("3.1, 100", "3.2, 101", "3.3, 102", "3.3, 102", 
          "3.4, 103", "3.4, 104"), 
       postcode = c("1", "2", "3,4", "3,4", "5", "6"), 
       exposure = c(1, 2, 7, 7, 5, 6))

新列postcode會粘在一起，具有相同的Latitude的Postcode和Longitude。 coords將粘貼Latitude和Longitude，而exposure將總計具有相同coords的Exposure，即，相同的Latitude和Longitude。

我可以通過使用dplyr包和for循環完成這個

x <- mutate(x, coords = paste(Latitude, Longitude, sep = ", ")) 
x <- cbind(x, postcode = rep(0, nrow(x)), exposure = rep(0, nrow(x))) 
for(i in unique(x$coords)){ 
    x$postcode[x$coords == i] <- paste(x$Postcode[x$coords == i], collapse = ", ") 
    x$exposure[x$coords == i] <- sum(x$Exposure[x$coords == i]) 
}

我怎麼可能只用唯一dplyr包，做到這一點不使用for循環？也許其他的方法，因爲我的實際數據集是相當大的，比使用for循環更有效

來源

2016-12-30 Hardian Lawi

第二個數據集具有不等數量的元素。請更新它 – akrun

@akrun我編輯了它。謝謝你的提示 –

如果你不修正，它會被關閉：Data.frame中的錯誤（Postcode = c（0,1,2,3,4,5,6），Latitude = c（3.1，：參數意味着行數不同：7，6 – hrbrmstr

這裏是你如何與dplyr做到這一點：

library(dplyr) 
x %>% 
    group_by(coords = paste(Latitude, Longitude, sep = ", ")) %>% 
    mutate(postcode = toString(Postcode), exposure = sum(Exposure)) 

# Source: local data frame [6 x 7] 
# Groups: coords [5] 
# 
# Postcode Latitude Longitude Exposure coords postcode exposure 
#  <dbl> <dbl>  <dbl> <dbl> <chr> <chr> <dbl> 
# 1  1  3.1  100  1 3.1, 100  1  1 
# 2  2  3.2  101  2 3.2, 101  2  2 
# 3  3  3.3  102  3 3.3, 102  3, 4  7 
# 4  4  3.3  102  4 3.3, 102  3, 4  7 
# 5  5  3.4  103  5 3.4, 103  5  5 
# 6  6  3.4  104  6 3.4, 104  6  6

來源

2016-12-30 03:49:03 Psidom

嗨，非常感謝您的解決方案。我的數據實際上存儲在一個spatialpointsdataframe對象中。我想我可以使用這種方法來操縱它，我只是意識到我不能使用group_by來訪問spatialpointsdataframe對象中的數據任何建議？ –

我想我可以先從sp對象中提取數據，然後應用dplyr，但我注意到它改變了對象，所以我無法存儲它回到sp對象 –

無論如何，謝謝，我只是知道你可以用這種方式使用group_by –

library(dplyr) 
library(tidyr) # unite() was used to join Lat, Lon 

x %>% unite(coords, Latitude, Longitude, sep = ",", remove = FALSE) %>% 
    group_by(coords) %>% mutate(exposure = sum(Postcode), postcode = toString(Postcode))

來源

2016-12-30 03:53:20

嗨，非常感謝您的解決方案。我的數據實際上存儲在spatialpointsdataframe對象中。我以爲我可以使用這種方法操縱它，我只是意識到我不能使用group_by來訪問spatialpointsdataframe對象中的數據。任何建議？ –

我想我可以先從sp對象中提取數據，然後應用dplyr，但我注意到它改變了對象，所以我不能將它存儲回sp對象 –

實際上我沒有太多的工作'sp'對象。你能分享一下你面對的hwat的細節嗎？ –

我們可以data.table做到這一點

library(data.table) 
setDT(x)[, coords := paste(Latitude, Longitude, sep=",") 
    ][, c("exposure", "postcode") :=.(sum(Postcode), toString(Postcode)), coords] 
x 
# Postcode Latitude Longitude Exposure coords exposure postcode 
#1:  1  3.1  100  1 3.1,100  1  1 
#2:  2  3.2  101  2 3.2,101  2  2 
#3:  3  3.3  102  3 3.3,102  7  3, 4 
#4:  4  3.3  102  4 3.3,102  7  3, 4 
#5:  5  3.4  103  5 3.4,103  5  5 
#6:  6  3.4  104  6 3.4,104  6  6

來源

2016-12-30 05:20:02 akrun

dplyr操縱橫行分組發生變異

回答

相關問題