2017-02-13 56 views
3

在細胞多於一個的字符串我有8列和許多許多行的數據幀的行。我想刪除行包含在列6和7以及輸出一個數據幀大於一個串僅具有一個在塔6串和7刪除包含在數據幀中

DF:

ID Content_ID Chromosome Start Stop Reference Alternate Length 
1299675221 backbone 12 99675221 99675221 GG T 0 
1298583685 backbone 12 98583685 98583685 C T 0 
129833474 backbone 12 9833474  9833474  C T 0 
1297722695 backbone 12 97722695 97722695 A G 0 
1297381269 backbone 12 97381269 97381269 T C 0 
1297081605 backbone 12 97081605 97081605 G AA 0 
1297058068 backbone 12 97058068 97058068 T C 0 
1295891848 backbone 12 95891848 95891848 CCTT ATA 0 
1294164312 backbone 12 94164312 94164312 T C 0 
12940191 backbone 12 940191  940191  T C 0 

期望的輸出:

ID Content_ID Chromosome Start Stop Reference Alternate Length 
1298583685 backbone 12 98583685 98583685 C T 0 
129833474 backbone 12 9833474  9833474  C T 0 
1297722695 backbone 12 97722695 97722695 A G 0 
1297381269 backbone 12 97381269 97381269 T C 0 
1297058068 backbone 12 97058068 97058068 T C 0 
1294164312 backbone 12 94164312 94164312 T C 0 
12940191 backbone 12 940191  940191  T C 0 

回答

3

我們可以通過圖6和7使用lapply列迴路中,檢查的字符數是否是1,則使用與Reduce&通過比較的相應的元件以獲得一個邏輯,用它來子集「DF」

df[Reduce(`&`, lapply(df[6:7], function(x) nchar(x)==1)),] 
#  ID Content_ID Chromosome Start  Stop Reference Alternate Length 
#2 1298583685 backbone   12 98583685 98583685   C   T  0 
#3 129833474 backbone   12 9833474 9833474   C   T  0 
#4 1297722695 backbone   12 97722695 97722695   A   G  0 
#5 1297381269 backbone   12 97381269 97381269   T   C  0 
#7 1297058068 backbone   12 97058068 97058068   T   C  0 
#9 1294164312 backbone   12 94164312 94164312   T   C  0 
#10 12940191 backbone   12 940191 940191   T   C  0 

或其他選項的行爲rowSums

df[!rowSums(nchar(as.matrix(df[6:7]))!=1),] 
2

同樣,你可以列粘貼在一起,然後繼續行,其中的數字符等於3,每列和一個空格。

df[nchar(paste(df$Reference, df$Alternate)) == 3,] 
      ID Content_ID Chromosome Start  Stop Reference Alternate Length 
2 1298583685 backbone   12 98583685 98583685   C   T  0 
3 129833474 backbone   12 9833474 9833474   C   T  0 
4 1297722695 backbone   12 97722695 97722695   A   G  0 
5 1297381269 backbone   12 97381269 97381269   T   C  0 
7 1297058068 backbone   12 97058068 97058068   T   C  0 
9 1294164312 backbone   12 94164312 94164312   T   C  0 
10 12940191 backbone   12 940191 940191   T   C  0 
1

簡單,因爲這使用data.table

library(data.table) 

setDT(df) 
df <- df[ nchar(Reference)==1 & nchar(Alternate)==1]