我試圖實現遺傳算法將DNA片段組裝成一個只給出光譜序列。我唯一的運營商是Edge Recombination,我確信它足以獲得相當不錯的結果。正確的邊緣重組交叉DNA裝配
但是......我無法擊敗80%(最佳分數的百分比),並且有500個碎片的實例可能需要2小時(如果在100次迭代中沒有改善,算法會停止)。我甚至執行它的權利?我沒有發現任何交叉運營商應該選擇匹配更好的元素(片段重疊 - 在大多數論文中,它只是我們挑選的),但是我實現了它,它選擇了最匹配的一個,如果有很多 - 隨機一個。沒有選擇最好的一個,它甚至不會達到40%。
我應該實施更多的交叉?或者它不是如何工作......或者我錯過了什麼?我在這一點上很絕望,等待24小時爲40個實例簡單的啓發式(山登山者)可以在5秒做...
有代碼(排列是一個洗牌的頻譜 - 字符串列表)
def crossover(g1, g2, arr, ind): # edge recombination operator
neigh_list = {} # adjacency list
length = len(g1.permutation) # expected length of a child
for i, base in enumerate(g1.permutation): # create nodes
neigh_list[base] = {g1.permutation[i - 1], g1.permutation[(i + 1) % length]}
for i, base in enumerate(g2.permutation): # add neighbours to each node
neigh_list[base].add(g2.permutation[i - 1])
neigh_list[base].add(g2.permutation[(i + 1) % length])
# a starting point of a child is a starting point of one of the parents
neigh_chosen = [g1.permutation[0], g2.permutation[0]][random.randint(0, 1)]
child = [neigh_chosen]
while len(child) < length: # run until child has desired length
min_length = 5 # each list has lower length than 5
min_neigh_list = []
for k in neigh_list: # for every node
if neigh_chosen in neigh_list[k]: # remove a chosen fragment from the node
neigh_list[k].remove(neigh_chosen)
for k in neigh_list[neigh_chosen]: # if a node is a neighbour of previously chosen
if len(neigh_list[k]) < min_length: # remember nodes with the fewest neighbours
min_length = len(neigh_list[k])
min_neigh_list = [k]
elif len(neigh_list[k]) == min_length:
min_neigh_list.append(k)
del neigh_list[neigh_chosen] # delete list of the chosen node
if len(min_neigh_list) > 0: # if the chosen node has any neighbours
# get the best match out of neighbours as next
max_overlap = overlap(neigh_chosen, max(min_neigh_list, key=lambda x: overlap(neigh_chosen, x)))
possibilities = list(filter(lambda x: overlap(neigh_chosen, x) == max_overlap, min_neigh_list))
neigh_chosen = possibilities[random.randint(0, len(possibilities) - 1)]
else:
# get the best match out of every node as next
max_overlap = overlap(neigh_chosen, max(neigh_list, key=lambda x: overlap(neigh_chosen, x)))
possibilities = list(filter(lambda x: overlap(neigh_chosen, x) == max_overlap, neigh_list))
neigh_chosen = possibilities[random.randint(0, len(possibilities) - 1)]
child.append(neigh_chosen) # add the node to the solution
arr[ind] = Gene(child)
你將很難得到這裏的答案,我認爲(或任何),因爲這是一個相當專業了。我對你正在做的事情很感興趣(我沒有說自己能夠回答),但有一件事可以幫助我,如果你堅持[PEP8](https://www.python.org/dev/peps/pep-0008 /)來打破這個巨大的代碼塊。儘管如此,我認爲通過啓發式方法需要一些實實在在的努力。 – roganjosh
您可能會在新的生物信息學堆棧交換站點上獲得很好的答案:https://area51.stackexchange.com/proposals/109245/bioinformatics – nbryans