2017-04-06 68 views
0

我有一個數據幀有兩個數字變量:latlong。這樣在R中,使用dplyr函數來查找最小距離

> head(pontos_sub) 
    id  lat  long 
1 0 -22,91223 -43,18810 
2 1 -22,91219 -43,18804 
3 2 -22,91225 -43,18816 
4 3 -22,89973 -43,20855 
5 4 -22,89970 -43,20860 
6 5 -22,89980 -43,20860 

現在什麼東西,我會做一個整數:

pontos_sub$long_r <- round(pontos_sub$long, 3) 
pontos_sub$lat_r <- round(pontos_sub$lat, 3) 

> head(pontos_sub) 
    id  lat  long long_r lat_r 
1 0 -22,91223 -43,18810 -43,188 -22,912 
2 1 -22,91219 -43,18804 -43,188 -22,912 
3 2 -22,91225 -43,18816 -43,188 -22,912 
4 3 -22,89973 -43,20855 -43,209 -22,900 
5 4 -22,89970 -43,20860 -43,209 -22,900 
6 5 -22,89980 -43,20860 -43,209 -22,900 

現在,我想用dplyr發現,每個獨特long_r lat_r組,並且用distVincentyEllipsoid功能,與相應組的所有緯度長度的最小距離。

> newdata <- pontos_sub %>% 
       group_by(long_r,lat_r) %>% 
       summarise(min_long = special_fun(arg), 
         min_lat = special_fun(arg)) 

得到的是這樣的:

> head(newdata) 
    long_r lat_r min_long min_lat 
1 -43,188 -22,912 xxxxxx xxxxxxx 
4 -43,209 -22,900 xxxxxx xxxxxxx 

最後,這是快速的方式嗎?還是有其他方式更快?牛逼

回答

1

你可以這樣來做:

pontos_sub %>% 
    mutate(dist = distVincentyEllipsoid(cbind(long, lat), cbind(long_r, lat_r))) %>% 
    group_by(long_r, lat_r) %>% 
    arrange(dist) %>% 
    slice(1) %>% 
    rename(min_long = long, min_lat = lat) %>% 
    select(long_r, lat_r, min_long, min_lat) 

# Source: local data frame [2 x 4] 
# Groups: long_r, lat_r [2] 
# 
# long_r lat_r min_long min_lat 
#  <dbl> <dbl>  <dbl>  <dbl> 
# 1 -43.209 -22.900 -43.20860 -22.89980 
# 2 -43.188 -22.912 -43.18804 -22.91219 

數據:

pontos_sub <- read.table(text=" 
    id  lat  long 
1 0 -22,91223 -43,18810 
2 1 -22,91219 -43,18804 
3 2 -22,91225 -43,18816 
4 3 -22,89973 -43,20855 
5 4 -22,89970 -43,20860 
6 5 -22,89980 -43,20860     
       ", dec = ",") 

pontos_sub$long_r <- round(pontos_sub$long, 3) 
pontos_sub$lat_r <- round(pontos_sub$lat, 3)