R聰明的方法來清理數據幀

我有一個數據框有兩列，一個索引列，索引第二個數據框中的行。這些行都包含特定的事件。這是哪個事件，編碼在第二列，在這裏命名爲code_start_stop。R聰明的方法來清理數據幀

實施例：

index <- c(769, 766, 810, 813, 830, 842, 842, 892, 907, 944) 
code_start_stop <- c(2006, 2001, 2004, 1001, 1004, 2001, 1001, 1006, 2004, 1004) 
replace_all <- data.frame(index, code_start_stop)

現在有對開始/停止碼，即2001和1001，2002和1002等的目的是，的情況下，有由一個封閉的排開始標記（即2006年）和相應的下一個停止標記（這裏是1006），這些行應該從數據框中移除。請注意，總是有成對的開始標記。

任何建議爲聰明的方式如何做到這一點表示讚賞。謝謝！

來源

2016-04-08 Christine Blume

'指數''和'code_start_stop'在這裏有不同的長度，所以'replace_all'不能用當前代碼創建。 – alistaire

你的問題有點令人困惑，請糾正我，如果我錯了。下面應該工作：

startm <- 2006 #startmarker 
endm <- 1006 #endmarker 

#look for row that contains markers 
index1 <- which(replace_all[,2] == startm) 
index2 <- which(replace_all[,2] == endm) 

#subset accordingly 
replace_all <- replace_all[-(index1:index2),]

注：這也消除了行，包含標記。如果只想刪除標記之間的行，請在子集化步驟中添加+ 1/-1。

來源

2016-04-08 17:40:16 maRtin

非常感謝！然而，我首先有成對的起始和終止標記： 'startm1 < - 2001 endm1 < - 1001 .... startm6 < - 2006 endm6 < - 1006' 此外，每對標記可能發生n數據幀中的時間（這比上面的例子大得多）。 –

你可以簡單地循環這些對 – maRtin

該解決方案現在基於martin的建議，似乎工作得很好。

我做下面通過對所有的開始和結束標記會：

to_delete <- c() 
## Care = 2001/1001 
startm1 <- 2001 
endm1 <- 1001 
index1 <- which((replace_all[,2] == startm1)) 
index2 <- which((replace_all[,2] == endm1)) 
if(length(index1) !=0){ 
    for (i in 1:length(index1)){ 
    if (index2[i]-index1[i]>1){ 
     to_delete <- c(to_delete, (((index1[i])+1):((index2[i])-1))) 
    } 
    } 
}

...經過的所有其他對啓動/停止標記，然後刪除to_delete

if (length(to_delete) != 0){ 
    replace_all <- replace_all[-to_delete,] 
    } 
    replace_all <- replace_all[,1] 
    }

來源

2016-04-09 11:55:59

R聰明的方法來清理數據幀

回答

相關問題