2017-08-08 107 views
0

每個人。我試圖根據距離(UTMs)和時間(H:M:S)標準獨立並且同時過濾GPS位置數據。這裏的數據結構:用多個for循環計算距離和子集

head(collar) 
FID animal  date  time  zone easting northing 
1 URAM01_2012 6/24/2012 10:00:00 AM 13S 356664 3971340 
2 URAM01_2012 6/24/2012 1:02:00 PM 13S 356760 3971480 
3 URAM01_2012 6/24/2012 4:01:00 PM 13S 357482 3972325 
4 URAM01_2012 6/24/2012 7:01:00 PM 13S 356882 3971327 
5 URAM01_2012 6/25/2012 4:01:00 AM 13S 356574 3971765 
6 URAM01_2012 6/25/2012 7:01:00 AM 13S 357796 3972231 

現在我只按距離過濾,但我有一些問題。代碼應計算FID [1]和FID [2]之間的距離,然後將該距離分配給新列($ step.length)中的FID [1]。在計算出所有距離後,然後基於距離規則對數據進行子集。現在我已經把它設置在我想要所有距離大於200米的地方。一旦進行子集化,則重複該過程直到所有後續位置之間的距離> 200m。下面是完成只是想什麼,我做的部分,我已經寫了代碼:

reps <- 10 
#Begin loop for the number of reps. Right now it's at 10 just to see if the code works. 
for(rep in 1:reps){ 

    #Begin loop for the number of GPS locations in the file 
    for(i in 1:length(collar$FID)){ 

    #Calculate the distance between a GPS location and the next GPS locations. the formula is the hypotenuse of the Pythagorean theorem. 
    collar$step.length[i] <- sqrt(((collar$easting[i] - collar$easting[i+1])^2) + ((collar$northing[i] - collar$northing[i+1])^2)) 

    } 

    #Subset the data. Select all locations that are >200m from the next GPS location. 
    collar <- subset(collar, step.length >200) 

} 

現在,代碼是不完美的,我想補充兩個條件到代碼。

1.)不考慮動物ID。因此,當距離應該是NA時,使用新動物的第一個位置產生動物最後位置的距離。我認爲使用for(i in 1:unique(collar $ animal))可能有效,但它沒有(令人震驚),我不確定該做什麼,只使用唯一值。

2.)我還想在所有位置> 200米的地方在for循環中插入一箇中斷。我敢肯定,必須有這樣做的更好的辦法,但我想我會成立代表的東西大(例如,10000),而一旦條件得到滿足,則R將打破:

if(collar$step.length > 200){ 
    break } 

然而, ,因爲if條件> 1,只有第一個元素被使用。我還沒有想過時間或距離/時間,但如果有人對這些努力有什麼建議,我會很感激你的建議。感謝您的幫助和指導。

回答

1

我不完全明白你想用代表做什麼,但你可以利用splitunsplit函數來關注每個動物。

首先,我創建了一個distance()函數,該函數可以找到從對象中命名爲easting和northing的列,以創建距離向量。然後我們將動物分開領口,並對每隻動物應用distance功能。我們將這個距離列表添加到一些mapply代碼的動物列表中,然後將unsplit的結果添加到一起,

讓我知道你想用「> 200」步驟做什麼。

distance <- function(x){ 
    easting <- x$easting 
    northing <- x$northing 
    easting2 <- c(easting[-1], NA) 
    northing2 <- c(northing[-1], NA) 
    sqrt((easting - easting2)^2 + (northing - northing2)^2) 
} 
s <- split(collar, collar$animal) 
distances <- lapply(s, distance) 
s2 <- mapply(cbind, s, "Distance" = distances, SIMPLIFY = F) 
collar.new <- unsplit(s2, collar$animal) 

編輯:

道歉,如果這是麻煩的,我相信我能得到它較短,但截至目前爲止,讓我知道它是否適合你。我也很好奇看到它運行得有多快,因爲我一直在編寫自己的數據。

filterout <- function(input, value = NULL){ 
    # requirements of the input object 
    stopifnot(all(c("FID","animal","easting","northing") %in% colnames(input))) 
    distance <- function(x){ # internal distance function 
    e1 <- x$easting; e2 <- c(NA, e1[-nrow(x)]) 
    n1 <- x$northing; n2 <- c(NA, n1[-nrow(x)]) 
    sqrt((e1 - e2)^2 + (n1 - n2)^2) 
    } 
    nc <- ncol(input) # save so we can "rewrite" Distance values each reiteration 
    f <- function(input){ # the recursive function (will run until condition is met) 
    z <- split(input[,-(nc+1)], input$animal) # split by animal & remove (if any) prior Distance column 
    distances <- lapply(z, distance) # collect distances 
    z2 <- mapply(cbind, z, "Distance" = distances, SIMPLIFY = F) # attach distances 
    r1 <- lapply(z2, function(x) { # delete first row under criteria 
     a <- x$Distance < value # CRITERIA 
     a[is.na(a)] <- FALSE # Corrects NA values into FALSE so we don't lose them 
     first <- which(a == T)[1] # we want to remove one at a time 
     `if`(is.na(first), integer(0), x$FID[first]) # returns FIDs to remove 
    }) 
    z3 <- unsplit(z2, input$animal) 
    # Whether to keep going or not 
    if(length(unlist(r1)) != 0){ # if list of rows under criteria is not empty 
     remove <- which(z3$FID %in% unlist(r1, use.names = F)) # remove them 
     print(unlist(r1, use.names = F)) # OPTIONAL*** printing removed FIDs 
     f(z3[-remove,]) # and run again 
    } else { 
     return(z3) # otherwise return the final list 
    } 
    } 
    f(input) 
} 

和函數可以如下使用:

filterout(input = collar, value = 200) 
filterout(input = collar, value = 400) 
filterout(input = collar, value = 600) 

EDIT2:

我打開了賞金問題弄清楚如何做了一定的步驟,但希望這個答案幫助。這可能需要一點點〜一分鐘做37K行,但讓我知道〜

x <- collar 

skipdistance <- function(x, value = 200){ 
    d <- as.matrix(dist(x[,c("easting","northing")])) 
    d[lower.tri(d)] <- 0 
    pick <- which(d > value, arr.ind = T) # pick[order(pick[,"row"]),] # visual clarity 

    findConnectionsBase <- function(m) { 
    n <- nrow(m) 
    myConnections <- matrix(integer(0), nrow = n, ncol = 2) 
    i <- j <- 1L 
    k <- 2L 
    while (i <= n) { 
     myConnections[j, ] <- m[i, ] 
     while (k <= n && m[i, 2] != m[k, 1]) {k <- k + 1L} 
     i <- k 
     j <- j + 1L 
    } 
    myConnections[!is.na(myConnections[,1]), ] 
    } 

    keep.ind <- findConnectionsBase(pick) 
    keep.row <- unique(c(keep.ind)) 
    cbind(x[keep.row,], Distance = c(NA,d[keep.ind])) 
} 

a <- do.call(rbind,lapply(split(x, x$animal), skipdistance, value = 200)) 
dim(a) 

編輯#3:

library(lubridate) # great package for string -> dates 

# changed to give just rows that satisfy greater than value criteria 
skip <- function(dist.var, value = 200){ 
    d <- as.matrix(dist(dist.var)) 
    d[lower.tri(d)] <- 0 
    pick <- which(d > value, arr.ind = T) # pick[order(pick[,"row"]),] # visual clarity 
    findConnectionsBase <- function(m) { 
    n <- nrow(m) 
    myConnections <- matrix(integer(0), nrow = n, ncol = 2) 
    i <- j <- 1L 
    k <- 2L 
    while (i <= n) { 
     myConnections[j, ] <- m[i, ] 
     while (k <= n && m[i, 2] != m[k, 1]) {k <- k + 1L} 
     i <- k 
     j <- j + 1L 
    } 
    myConnections[!is.na(myConnections[,1]), ] 
    } 
    unique(c(findConnectionsBase(pick))) 
} 

collar <- structure(list(FID = 1:8, animal = c("URAM01_2012", "URAM01_2012", "URAM01_2012", "URAM01_2012", "URAM01_2013", "URAM01_2013", "URAM01_2013", "URAM01_2013"), date = c("6/24/2012", "6/24/2012", "6/24/2012", "6/24/2012", "6/25/2012", "6/25/2012", "6/25/2012", "6/25/2012" ), time = c("10:00:00AM", "1:02:00PM", "4:01:00PM", "7:01:00PM", "4:01:00AM", "7:01:00AM", "7:01:00AM", "7:01:00AM"), zone = c("13S", "13S", "13S", "13S", "13S", "13S", "13S", "13S"), easting = c(356664L, 
356760L, 356762L, 356882L, 356574L, 357796L, 357720L, 357300L), northing = c(3971340L, 3971480L, 3971498L, 3971498L, 3971765L, 3972231L, 3972230L, 3972531L)), .Names = c("FID", "animal", "date", "time", "zone", "easting", "northing"), class = "data.frame", row.names = c(NA, -8L)) 


collar[skip(dist.var = collar[,c("easting","northing")], 
      value = 200),] 
# dist function works on dates, but it makes sense to convert to hours 
dist(lubridate::mdy_hms(paste(collar$date, collar$time))) 
hours <- 2.99 
collar[ skip(dist.var = lubridate::mdy_hms(paste(collar$date, collar$time)), 
      value = hours * 3600), ] 
+0

呵呵。很有意思。看到解決問題的不同方式總是很整潔。首先,感謝您的幫助,@efriedland。對此,我真的非常感激。 首先,200是我目前用來決定一個位置是否應該停留的距離(米)。我還會在未來的試驗中將這個距離增加到400米,600米等。這樣做的目的是試圖消除位置之間的空間自相關。 –

+0

現在,如果代碼運行一次,則每個點與下列點具有成對的距離。刪除<200米的點後,數據集內仍有<200米的位置。例如,pt1距離pt2爲100 m,距離點3爲175 m。在第一輪過濾之後,代碼將不得不再次運行以生成pt1和pt3之間的距離(175 m),並在重新進行子集化後刪除pt3 。這就是'代表'的意義所在。重新開始整個過程​​。在某個時間段內,所有連續地點應該距離彼此大於200米。 –

+0

我現在正在刺探這件事。在你的例子中,pt1-> pt2是100m,所以我們把距離值放在pt1行上。但是我們想刪除pt2 ...你會怎麼說將100米米放在pt2上並將pt1上的NA作爲開始? –

0

非常感謝,並喊出埃文他所有辛苦的工作。很明顯,他生成的代碼與我提出的代碼有點不同,但這對這個社區來說是很棒的事情;分享獨特的解決方案我們自己可能不會想到。請參閱編輯#2以瞭解通過連續點之間的距離過濾GPS領圈數據的最終代碼。