從一個列表和一個與Python的詞典構建一個數組

我想用列表構建一個矩陣，然後用dict的值填充它。它適用於小數據，但當使用更大的數據時計算機崩潰（RAM不足）。我的腳本顯然太重了，但我沒有看到如何改進它（首次編程）。由於從一個列表和一個與Python的詞典構建一個數組

import numpy as np 
liste = ["a","b","c","d","e","f","g","h","i","j"] 

dico = {"a/b": 4, "c/d" : 2, "f/g" : 5, "g/h" : 2} 

#now i'd like to build a square array (liste x liste) and fill it up with the values of 
# my dict. 


def make_array(liste,dico): 
    array1 = [] 
    liste_i = [] #each line of the array 
    for i in liste: 
     if liste_i : 
      array1.append(liste_i) 
      liste_i = [] 
     for j in liste: 
      if dico.has_key(i+"/"+j): 
       liste_i.append(dico[i+"/"+j]) 
      elif dico.has_key(j+"/"+i): 
       liste_i.append(dico[j+"/"+i]) 
      else : 
       liste_i.append(0) 
    array1.append(liste_i) 
    print array1 
    matrix = np.array(array1) 
    print matrix.shape() 
    print matrix 
    return matrix 

make_array(liste,dico)

非常感謝，給你答案，使用in dico或列表理解並提高腳本的速度，這是非常有益的。但似乎我的問題是由下面的函數引起的：

def clustering(matrix, liste_globale_occurences, output2): 
    most_common_groups = [] 
    Y = scipy.spatial.distance.pdist(matrix) 
    Z = scipy.cluster.hierarchy.linkage(Y,'average', 'euclidean') 
    scipy.cluster.hierarchy.dendrogram(Z) 
    clust_h = scipy.cluster.hierarchy.fcluster(Z, t = 15, criterion='distance') 
    print clust_h 
    print len(clust_h) 
    most_common = collections.Counter(clust_h).most_common(3) 
    group1 = most_common[0][0] 
    group2 = most_common[1][0] 
    group3 = most_common[2][0] 
    most_common_groups.append(group1) 
    most_common_groups.append(group2) 
    most_common_groups.append(group3) 
    with open(output2, 'w') as results: # here the begining of the problem 
     for group in most_common_groups: 
      for i, val in enumerate(clust_h): 
       if group == val: 
        mise_en_page = "{0:36s} groupe co-occurences = {1:5s} \n" 
        results.write(mise_en_page.format(str(liste_globale_occurences[i]),str(val)))

當使用較小的文件，我得到正確的結果，例如：

接觸= GROUPE 2

觸點b = 2 GROUPE

觸點c = GROUPE 2

接觸d = GR oupe 2

接觸E = GROUPE 3

接觸F = GROUPE 3

但是當使用重文件，我只得到每組一個例子：

接觸的= groupe 2

contact a = groupe 2

接觸的= GROUPE 2

接觸= GROUPE 2

接觸E = GROUPE 3

接觸E = GROUPE 3

來源

2015-07-11 EL Walou

你能解釋一下*建立一個矩陣列表，然後用字典的值，它填平。*？也許只是展示一個簡單的例子！ – Kasramvd

不要使用'has_key'它將在2.7中被棄用，並在3中被移除，使用'in dico' –

可以創建一個矩陣mat = LEN（清單當然） * len（聽）零並通過你的dico和split鍵：'/'之前的val將是'/'之後的行數和val將是列數。這樣你就不需要使用'has_key'搜索功能。

來源

2015-07-11 10:45:25 zveryansky

您的問題看起來像一個O（n ），因爲您想從liste自己獲得所有組合。所以你必須有一個內部循環。

你可以嘗試做的事情是將每行寫入一個文件，然後在新的過程中，從文件中創建矩陣。新進程將使用更少的內存，因爲它不需要存儲大量輸入liste和dico。因此，像這樣：

def make_array(liste,dico): 
    f = open('/temp/matrix.txt', 'w') 
    for i in liste: 
     for j in liste: 
      # This is just short circuit evaluation of logical or. It gets the first value that's not nothing 
      f.write('%s ' % (dico.get(i+"/"+j) or dico.get(j+"/"+i) or 0)) 
     f.write('\n') 
    f.close() 
    return

然後，一旦這個已經執行了可以撥打

print np.loadtxt('/temp/matrix.txt', dtype=int)

我已經使用短路評估，以減少您if語句的代碼行。事實上，如果你使用list comprehensions您可以make_array功能降低這樣的：

def make_array(liste,dico): 
    return np.array([[dico.get(i+"/"+j) or dico.get(j+"/"+i) or 0 for j in liste] for i in liste])

來源

2015-07-11 14:10:36 user2718281

從一個列表和一個與Python的詞典構建一個數組

回答

相關問題