拆分的DNA序列與d

DNA串的密碼子的列表包括四個字符的字母表，A,C,G, and T 如果給定字符串，拆分的DNA序列與d

ATGTTTAAA

我想它在其組成的密碼子分裂

ATG TTT AAA 

    codons = ["ATG","TTT","AAA"]

密碼子編碼的蛋白質和它們是冗餘的（http://en.wikipedia.org/wiki/DNA_codon_table）

我有d的DNA串並想spli將其轉化爲密碼子的的範圍，然後將密碼子翻譯/映射成氨基酸。

std.algorithm有一個分隔符函數，它需要一個分隔符，而且std.regex Splitter函數需要一個正則表達式來分割字符串。有沒有一個慣用的方法來分割沒有分隔符的字符串？

來源

2015-05-04 eastafri

你的意思是，你要每3個字符後插入分隔符'」「'？ – stakx

我想要得到一個密碼範圍，即每個3個字符。 – eastafri

看起來你正在尋找chunks：

import std.range : chunks; 
import std.encoding : AsciiString; 
import std.algorithm : map; 

AsciiString ascii(string literal) 
{ 
    return cast(AsciiString) literal; 
} 

void main() 
{ 
    auto input = ascii("ATGTTTAAA"); 
    auto codons = input.chunks(3); 
    auto aminoacids = codons.map!(
     (codon) { 
      if (codon == ascii("ATG")) 
       return "M"; 
      // ... 
     } 
    ); 
}

請注意，我用的http://dlang.org/phobos/std_encoding.html#.AsciiString這裏不是純的字符串文字。這是爲了避免代價昂貴的UTF-8解碼，這是爲string完成的，並且不適用於實際的DNA序列。我記得之前爲類似的生物信息學代碼做出了顯着的性能差異。

來源

2015-05-04 11:34:45

謝謝！關於UTF-8編碼的部分聽起來很有趣！希望有更多關於這些技巧和想法的文檔！ – eastafri

import std.algorithm; 
import std.regex; 
import std.stdio; 

int main() 
{ 
    auto seq = "ATGTTTAAA"; 
    auto rex = regex(r"[AGT]{3}"); 

    auto codons = matchAll(seq, rex).map!"a[0]"; 

    writeln(codons); 

    return 0; 
}

來源

2015-05-04 11:14:50 dmakarov

如果您只想要3個字符的組，您可以使用std.range.chunks。

import std.conv : to; 
import std.range : chunks; 
import std.algorithm : map, equal; 

enum seq = "ATGTTTAAA"; 
auto codons = seq.chunks(3).map!(x => x.to!string); 
assert(codons.equal(["ATG", "TTT", "AAA"]));

大塊的的foreach類型是Take!string，所以你可能會或可能不會需要map!(x => x.to!string)，這取決於你想如何使用的結果。

例如，如果你只是想打印出來：

foreach(codon ; "ATGTTTAAA".chunks(3)) { writeln(codon); }

來源

2015-05-04 11:46:03 rcorre

拆分的DNA序列與d

回答

相關問題