2016-06-14 98 views
2

我有以下數據集,它由圍繞64000行的:while循環在for循環中,有沒有更簡單快捷的方法?

Trial.time Recording.time X.center Y.center Area Areachange Elongation Distance.moved Movement.Moving...Center.point. 
2  300.030   0.000 -49.1651 31.9676 0.917085 0.65113 0.851349    -        - 
22  300.696   0.666 -48.4404 31.9945 0.816206 0.715326 0.831207  0.725139        1 
24  300.763   0.733 -47.996 32.0696 0.834547 0.412688 0.856234  0.450784        1 
33  301.063   1.033 -47.6583 32.0598 0.75201 0.137563 0.716028  0.337775        1 
41  301.330   1.299 -47.3385 32.0139 0.843718 0.302638 0.838526  0.323117        1 
98  303.230   3.199 -47.3914 31.6981 0.944598 1.26558 0.847969  0.32022        1 
113 303.730   3.699 -47.3807 31.0614 0.86206 1.24724 0.761099  0.636771        1 
114 303.763   3.733 -47.1308 30.3858 1.00879  1.1005 0.809162  0.72036        1 
116 303.830   3.799 -47.1914 30.0551 1.01796 0.440201 0.831924  0.336155        1 

一般而言,描述了在一個特定的Recording.time物體的運動(Distance.Moved)。如果連續兩行的Recording.time小於0.035,則兩行都屬於一次移動。相反,如果它更大,則時間點代表兩個單獨的移動。我的工作是確定每個運動的長度,因此連續多少行給出一個運動以及運動內的總距離。我寫了下面的代碼,它可行,但速度很慢,我想問你是否有任何想法如何提高速度。

time <- c() 
j.final <- c() 

#Go through all rows of the data.frame 
for(i in 1:length(data2[,1])){ 
    i <- 1 
    j <- 1 
    if (!is.na(data2$Recording.time[i+1])){ 

    # As long as the distance between two consecutive time points is smaller than 0.035, increase the counter by one 
    while (data2$Recording.time[i+1]-data2$Recording.time[i] <= 0.035){ 
     j <- j+1 
     i <- i+1 
    } 
    # Save the number of consecutive time points 
    j.final <- rbind(j.final,j) 
    # Save the time of the last movement frame 
    time <- rbind(time,data2$Recording.time[j]) 
    # Delete the amount of rows that gave one single movement 
    data2 <- data2[-(1:j),] 
    } 
} 
final <- cbind(j.final,time) 

#Same as above... Continouslz rows out of the data.frame 
data2 <- data1 
for (i in 1:length(j.final)){ 
    Dtotal <- sum(data2$Distance.moved[1:j.final[i]]) 
    distance <- rbind(distance, Dtotal) 
    data2 <- data2[-(1:j.final[i]),] 
} 
final <- cbind(final,distance) 
dimnames(final) <- list(NULL,c("Frames","Time","Distance")) 
epicfinal <- as.data.frame(final) 

最後的結果看起來是這樣的(請不要介意速度)

Frames Time Distance velocity 
1  1 0.033 0.0407652 0.001386017 
2  18 0.666 1.4887506 0.911115367 
3  3 0.799 0.0912680 0.009309336 
4  7 1.066 0.3703880 0.088152344 
5  2 1.166 0.0371303 0.002524860 
6  3 1.299 0.1013617 0.010338893 
+3

查找到'lead','lag','cumsum'功能。 – zx8754

回答

5

由於zx8754指出,這是很容易與lag(或更好的,他的快方面取得內部data.tableshift )和cumsum函數。
我使用data.table包速度(注意語法是從經典data.frames太大的不同,與data.table可以子集劃分的表時,把表達的j的說法,而不是簡單地在data.frame選擇列)。

library(data.table) 

## VARIABLE CREATION: 
# Create a column which indicates the lag between two observations 
data$lag <- data$Recording.time-shift(data$Recording.time) 
data$lag[1] <- 0 # The first value is always NA: fix it 
data$newmovement <- data$lag<0.035 # Binary variable: T if there's a new movement, F otherwise 
data$movement_index <- cumsum(data$newmovement) # Index to identify the movement 

## COMPUTATIONS: 
# Use the data.table package for fast computations 
data <- data.table(data) 
data[,.(length_movement=.N, # Length (nrows) for each movement 
     total_distance=sum(Distance.moved,na.rm = T)), # Total distance: sum of distances for each movement 
     by=movement_index] # Subset by=movement_index 

# movement_index length_movement total_distance 
# 1:    1    7  2.793806 
# 2:    2    2  1.056515 

注意,##VARIABLE CREATION部分可通過data.table包來實現了。
這可能會導致額外的速度提升,你可以做到這一點通過以下替換代碼的第一部分:

## VARIABLE CREATION: 
data[,lag:=Recording.time-shift(Recording.time)][1,lag:=0L] 
data[,newmovement:=lag<0.035] 
data[,movement_index:=cumsum(newmovement)] 
+2

'data $ lag <0.035'已經是一個二元變量。不需要'ifelse'只需使用'!(data $ lag <0.035)'或'data $ lag> = 0.035'。你可以在data.table中做同樣的事情(不用'replace')。 'cumsum'會將邏輯值自動強制爲整數。 – Roland

+0

謝謝,我用你的建議編輯了代碼。 – hellter

+1

非常感謝您的詳細回覆。但是,要解決我的問題,您必須使用shift而不是lag(數據$ lag < - data $ Recording.time-shift(data $ Recording.time))。 – Urumpel