2015-10-20 59 views
4

我有一個包含原始目標數據和一些關聯變量的數據集。它看起來是這樣的:R中數據的行比較

"Origin","Destination","distance","volume" 
    "A01"  "A01"   0.0  10 
    "A02"  "A01"   1.2   9 
    "A03"  "A01"   1.4  15 
    "A01"  "A02"   1.2  16 

然後對於每個出發地和目的地,我希望能夠計算根據雙方該行的數據,並在選擇其他行的其他變量。例如,去往該目的地的多少其他起點區域的交通量大於焦點對。在這個例子中,我將以目標A01的以下內容結束。

"Origin","Destination","distance","volume","greater_flow" 
    "A01" "A01"   0.0  10   1 
    "A02" "A01"   1.2   9   2 
    "A03" "A01"   1.4  15   0 

我一直在試圖解決與group_byapply的東西,但不能工作了我如何在)「修復」的數據要作爲參照(體積從A01到A01)和b使用)僅將比較限制爲具有相同目的地(A01)的數據,並且c)針對所有起點 - 目的地對重複該比較。

回答

1

這裏是(使用apply)使用基礎R答案:

d <- data.frame(Origin = c("A01", "A02", "A03", "A01"), Destination = c("A01", "A01", "A01", "A02"), distance = c(0.0, 1.2, 1.4, 1.2), volume = c(10, 9, 15, 16)) 

# extracting entries with destination = A01 
d2 <- d[d[, "Destination"] == "A01", ] 

# calculating number of rows satisfying your condition 
greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0)) 

# sticking things back together 
data.frame(d2, greater_flow) 

# Origin Destination distance volume greater_flow 
# 1 A01   A01  0.0  10   1 
# 2 A02   A01  1.2  9   2 
# 3 A03   A01  1.4  15   0 

,如果你需要做的計算所有可能的目的地,你可以只通過unique(d[, "Destination"])週期:

lapply(unique(d[, "Destination"]), FUN = function(dest){ 
     d2 <- d[d[, "Destination"] == dest, ] 
     greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0)) 

    data.frame(d2, greater_flow)  
}) 

如果需要,您可以通過do.call(rbind, output)將輸出粘貼在一起。

+0

謝謝,這真的很有幫助。我的實際問題更復雜,但我現在可以看到如何處理它。 –

0
library(plyr) 
Fun <- function(x) { x <- x[order(x$volume),]; x$greater_flow <- (1:nrow(x))-1; x } 
ddply(d, ~ Destination, .fun=Fun)