2011-11-21 71 views
1

我想知道是否有任何方法以有效的方式解決以下問題。我有一個X-Y點的集合。對於每個點我需要生成一定數量的記錄,最後,我需要將所有正在生成的記錄堆疊在一起。最初我是用FOR循環做的,並且在每個循環中使用cbind來堆棧data.frame。現在通過定義最終記錄堆棧的尺寸來改變它,並試圖用生成的值替換那些0。我的代碼發佈在下面(用**指出我被卡住的地方)..如果你能給我一個提示,或者甚至有更好的解決方案,它將是完美的!R使用向量化替換數據幀記錄

colonies <- read.table(text =    
' X  Y  Timecount ID_col Age 
582906.4 2883317  2004  1 15 
583345.9 2883102  2004  2 4 
583119.5 2883621  2004  3 13 
583385.0 2882933  2004  4 5 
583374.0 2882936  2004  5 2 
583271.0 2883076  2004  7 5 
582898.9 2883229  2004  8 1 
582927.9 2883234  2004  9 20 
582956.7 2883272  2004  10 13 
582958.8 2883249  2004  11 3', header = TRUE) 

year = 2004 
survival_prob = 0.01 
male_prob = 0.5 

Present <- colonies$Timecount == year 

app <- sum(colonies$Age[Present] >= 4 & colonies$Age[Present] < 10) * 1000 * survival_prob 
app2 <- sum(colonies$Age[Present] >= 10 & colonies$Age[Present] < 15) * 10000 * survival_prob 
app3 <- sum(colonies$Age[Present] >= 15 & colonies$Age[Present] <= 20) * 100000 * survival_prob 

size <- app + app2 + app3 

pop <- data.frame(matrix(0,nrow=size,ncol=2)) 
colnames(pop) <- c("X","Y") 

if (dim(pop)[1] > 0){ 

#FOR cycle going through each existing point 
for (i in 1:sum(Present)){  

    if (colonies[Present,]$Age[i] < 4) { next 
    } else if (colonies[Present,]$Age[i] >= 4 & colonies[Present,]$Age[i] < 10) { alates <- 1000 
    } else if (colonies[Present,]$Age[i] >= 10 & colonies[Present,]$Age[i] < 15) { alates <- 10000 
    } else if (colonies[Present,]$Age[i] >= 15 & colonies[Present,]$Age[i] <= 20) { alates <- 100000 
    } 

    indiv <- alates * survival_prob 
    #Initialize two coordinate variables based on the established (or existing) colonies 
    X_temp <- round(colonies[Present,]$X[i],2) 
    Y_temp <- round(colonies[Present,]$Y[i],2) 
    distance <- rexp(indiv,rate=1/200) 
    theta <- runif(indiv, 0, 2*pi) 
    C <- cos(theta) 
    S <- sin(theta) 
    #XY coords (meters) using polar coordinate transformations 
    X <- X_temp + round(S * distance,2) 
    Y <- Y_temp + round(C * distance,2) 
    pop[,] <- c(X,Y) #******HERE I GOT STUCK...it should be pop[1:indiv,] 
        #but then it does not work for the next i since it would over write... 

    } 
    pop$Sex <- rbinom(size,1,male_prob) 
    pop$ID <- 1:dim(pop)[1] 
} 
+0

該代碼似乎有問題......你真的想在4歲以下不做任何事情嗎?如果是這樣,立即折騰它。在我看來,這可以全部是矢量化的。請評論它更好,也許提供一個更好的描述你想完成什麼。 – John

回答

1

我相信這就是你要找的,很好的表達矢量化R代碼。沒有循環,甚至不應用家庭或plyr命令。你可以做很多事情來使它更加靈活,但使用rep的核心矢量化,以及對你的隨機距離的單個調用是非常關鍵的。我不知道爲什麼pop的尺寸有if子句。你需要以不同的方式處理,因爲它不會結束。

year = 2004 
survival_prob = 0.01 
male_prob = 0.5 

# you don't do anything in your for loop or save any of the results if the age is 
# less than 4. I'm going to just remove that from colonies on the assumption that it's 
# larger than posted and comes from a file that you won't change. Where I edit 
# colonies you might want to work with a copy. 
colonies <- colonies[colonies$Age >= 4,] 

# only Present selection of colonies is ever used in this code so you could also stop 
# repeatedly selecting... this one I'm imagining you might make a copy of, something 
# like coloniesP in your real code. In general, you want as little going on in a 
# loop and as little repeating yourself as possible. Note, this might be memory 
# intensive if colonies is actually very large. Feel free to going back to selecting 
# since it would happen much less frequently in the new code anyway. 
Present <- colonies$Timecount == year 
colonies <- colonies[Present,] 

# no difference up to size, then it all is 
app <- sum(colonies$Age >= 4 & colonies$Age < 10) * 1000 * survival_prob 
app2 <- sum(colonies$Age >= 10 & colonies$Age < 15) * 10000 * survival_prob 
app3 <- sum(colonies$Age >= 15 & colonies$Age <= 20) * 100000 * survival_prob 

size <- app + app2 + app3 

#note that ifelse can be used to declare alates as vectors 
alates <- ifelse(colonies$Age >= 4 & colonies$Age < 10, 1000, 100000) 
alates <- ifelse(colonies$Age >= 10 & colonies$Age < 15, 10000, alates) 

# as a consequence, more stuff can be vectorized 
indiv <- alates * survival_prob 

# we can do some cool stuff with rep to continue vectorizing 
# (round when done if you must) 
X_temp <- rep(colonies$X, indiv) 
Y_temp <- rep(coloines$Y, indiv) 

#Initialize two coordinate variables based on the established (or existing) colonies... now as vectors of the entire data frame size 
distance <- rexp(size,rate=1/200) 
theta <- runif(size, 0, 2*pi) 
C <- cos(theta) 
S <- sin(theta) 
#XY coords (meters) using polar coordinate transformations 
X <- X_temp + S * distance 
Y <- Y_temp + C * distance 
pop <- data.frame(X,Y) 
pop$Sex <- rbinom(size,1,male_prob) 
pop$ID <- 1:dim(pop)[1] 
# now round... once 
pop$X <- round(pop$X,2) 
pop$Y <- round(pop$Y,2) 

此外,你可能要注意,即使它不能被矢量有你的問題的解決方案與分配值到流行這是非常簡單的。不。只需使用lapply函數返回一個data.frame,然後綁定data.frame對象列表。

+0

謝謝約翰! lapply版本是我已經測試過的東西,但是當我將所有列表元素堆疊在一起時,我需要更長時間才能使用當前循環...您在此處編寫的解決方案雖然工作完美...我真的很感激它...在那裏有什麼機會可以通過電子郵件與您聯繫?弗朗切斯科 – Francesco