我想子集只包含子字符串,然後刪除子字符串。我可以做的第一部分,但我不知道如何刪除子子集DNAStringSet的子模式,並刪除R中的子模式
下面是一個例子
library(Biostrings)
myseq <-DNAStringSet(c("CCCATGAAAGATCGGAAGAGCACACGTCTGAACCCATGAA", "CCCATGAACATAGATCC", "CCCGTACAGATCACGTG"))
names(myseq) <- letters[1:3]
myseq
A DNAStringSet instance of length 3
width seq names
[1] 40 CCCATGAAAGATCGGAAGAGCACACGTCTGAACCCATGAA a
[2] 17 CCCATGAACATAGATCC b
[3] 17 CCCGTACAGATCACGTG c
我想刪除的順序是AGATCGGAAGAGCACACGTCTGAA這是在第一線。
matchPattern("AGATCGGAAGAGCACACGTCTGAA", myseq[[1]])
Views on a 40-letter DNAString subject
subject: CCCATGAAAGATCGGAAGAGCACACGTCTGAACCCATGAA
views:
start end width
[1] 9 32 24 [AGATCGGAAGAGCACACGTCTGAA]
於子集我做到以下幾點:
pat <- vmatchPattern("AGATCGGAAGAGCACACGTCTGAA", myseq)
myseq[ lapply(lapply(pat, isEmpty), function(x) x == FALSE) ]
A DNAStringSet instance of length 3
width seq names
[1] 40 CCCATGAAAGATCGGAAGAGCACACGTCTGAACCCATGAA a
[2] 0 b
[3] 0 c
輸出應該
A DNAStringSet instance of length 3
width seq names
[1] 11 CCCCCCATGAA a
[2] 0 b
[3] 0 c