R中數據的行比較

我有一個包含原始目標數據和一些關聯變量的數據集。它看起來是這樣的：R中數據的行比較

"Origin","Destination","distance","volume" 
    "A01"  "A01"   0.0  10 
    "A02"  "A01"   1.2   9 
    "A03"  "A01"   1.4  15 
    "A01"  "A02"   1.2  16

然後對於每個出發地和目的地，我希望能夠計算根據雙方該行的數據，並在選擇其他行的其他變量。例如，去往該目的地的多少其他起點區域的交通量大於焦點對。在這個例子中，我將以目標A01的以下內容結束。

"Origin","Destination","distance","volume","greater_flow" 
    "A01" "A01"   0.0  10   1 
    "A02" "A01"   1.2   9   2 
    "A03" "A01"   1.4  15   0

我一直在試圖解決與group_by和apply的東西，但不能工作了我如何在）「修復」的數據要作爲參照（體積從A01到A01）和b使用）僅將比較限制爲具有相同目的地（A01）的數據，並且c）針對所有起點 - 目的地對重複該比較。

來源

2015-10-20 B_Dabbler

這裏是（使用apply）使用基礎R答案：

d <- data.frame(Origin = c("A01", "A02", "A03", "A01"), Destination = c("A01", "A01", "A01", "A02"), distance = c(0.0, 1.2, 1.4, 1.2), volume = c(10, 9, 15, 16)) 

# extracting entries with destination = A01 
d2 <- d[d[, "Destination"] == "A01", ] 

# calculating number of rows satisfying your condition 
greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0)) 

# sticking things back together 
data.frame(d2, greater_flow) 

# Origin Destination distance volume greater_flow 
# 1 A01   A01  0.0  10   1 
# 2 A02   A01  1.2  9   2 
# 3 A03   A01  1.4  15   0

，如果你需要做的計算所有可能的目的地，你可以只通過unique(d[, "Destination"])週期：

lapply(unique(d[, "Destination"]), FUN = function(dest){ 
     d2 <- d[d[, "Destination"] == dest, ] 
     greater_flow <- apply(d2, 1, FUN = function(x) max(sum(x['volume'] < d2[, 'volume']) - 1, 0)) 

    data.frame(d2, greater_flow)  
})

如果需要，您可以通過do.call(rbind, output)將輸出粘貼在一起。

來源

2015-10-20 15:19:30

謝謝，這真的很有幫助。我的實際問題更復雜，但我現在可以看到如何處理它。 –

library(plyr) 
Fun <- function(x) { x <- x[order(x$volume),]; x$greater_flow <- (1:nrow(x))-1; x } 
ddply(d, ~ Destination, .fun=Fun)

來源

2015-10-20 14:27:08 jogo

R中數據的行比較

回答

相關問題