2016-04-28 116 views
1
tiny_reads = [ 
Sequence('CGTGCAA'), 
Sequence('TGCAATG'), 
Sequence('ATGGCGT'), 
Sequence('GGCGTGC'), 
Sequence('CAATGGC'),] 



dictionary = {} 

def kmers(reads, k): 
for line in tiny_reads: 
    for kmer in line.iter_kmers(k, overlap=3): 
     dictionary[str(kmer)] = 1 
     print(dictionary) 
     if str(kmer) not in dictionary: 
      dictionary[str(kmer)] = 1 
     else: 
      dictionary[str(kmer)] += 1 


#print(dict) 
kmers(tiny_reads, 3) 
print(dictionary) 

我的代碼遍歷上面的序列,並使用iter_kmer()將序列分解爲大小爲3的小讀取('CGT')。我想創建一個字典,它將包含所有這些小讀數以及它們在序列中的數量。我得到的結果是關閉的,我不知道爲什麼。當在字典python中找到密鑰時更新值

預期結果:

k鏈節(tiny_reads,3) { 'AAT':2, 'ATG':3,... 'TGG':2}

我的結果: {」 CAA':2'GTG':2'GCA':2'GCG':2'ATG':2'TGC':2'CGT':2'AAT':2'GGC' :2,'TGG':2}

我的結果不正確,因爲'ATG'重複了3次。你們能否幫助我讓這讓人沮喪?

回答

0

不確定iter_kmers究竟是如何工作的,但也許您正在尋找類似以下的東西?

tiny_reads = [ 
    Sequence('CGTGCAA'), 
    Sequence('TGCAATG'), 
    Sequence('ATGGCGT'), 
    Sequence('GGCGTGC'), 
    Sequence('CAATGGC') 
] 

kmer_d = dict() 

def kmers(reads, k): 
    for tiny_r in tiny_reads: 
     for kmer in tiny_r.iter_kmers(k, overlap=3): 
      d = kmer_d.get(str(kmer), 0) 
      kmer_d[str(kmer)] = d + 1 


if __name__ == "__main__": 
    kmers(tiny_reads, 3) 
    print(kmer_d) 

請記住,這可能不是最快的實現,但它只是簡單地修復了最小變化的錯誤。 當從字典中讀取值,使用您可以設置情況下,沒有條目中發現

+0

非常感謝你的工作 – Mufassa

3

要重設與每行詞典中的計數器的默認值的獲得()方法,你是通過迭代:

用你已經有的代碼,我會使用defaultdict。

from collections import defaultdict 

def kmers(reads, k): 
    dictionary = defaultdict(int) 
    for line in tiny_reads: 
     for kmer in line.iter_kmers(k, overlap=3): 
      dictionary[str(kmer)] += 1 

如果我正在編寫代碼,我可能會連接所有行,然後使用計數器。

def kmers(reads, k): 
    accumlator = [] 
    for line in tiny_reads: 
     accumlator += line.iter_kmers(k, overlap=3): 
    dictionary = Counter(accumlator) 
+1

非常感謝你 – Mufassa