減少Python列表中的重複項列表

我正在編寫一個程序，該程序讀入大量文件，然後對其中的項進行索引。我能夠將這些文件讀入Python中的二維數組（列表），但是隨後我需要刪除第一列中的重複項，並將索引存儲在新列中，首次出現重複單詞。減少Python列表中的重複項列表

例如：

['when', 1] 
['yes', 1] 
['', 1] 
['greg', 1] 
['17', 1] 
['when',2]

的第一列是項，第二個是的DocID它從來到我希望能夠把這裏改爲：

['when', 1, 2] 
['yes', 1] 
['', 1] 
['greg', 1] 
['17', 1]

移除重複。

這是我到目前爲止有：

for j in range(0,len(index)): 
     for r in range(1,len(index)): 
       if index[j][0] == index[r][0]: 
         index[j].append(index[r][1]) 
         index.remove(index[r])

我不斷收到一個超出範圍的錯誤在

if index[j][0] == index[r][0]:

，我認爲這是因爲我從索引中移除對象所以它變得越來越小。任何想法將不勝感激（是的，我知道我不應該修改原來的，但是這僅僅是測試它在小範圍內）

來源

2012-02-28 Andy Mitch

不會更appropiate建設dict/defaultdict？

喜歡的東西：

from collections import defaultdict 

ar = [['when', 1], 
     ['yes', 1], 
     ['', 1], 
     ['greg', 1], 
     ['17', 1], 
     ['when',2]] 

result = defaultdict(list) 
for lst in ar: 
    result[lst[0]].append(lst[1])

輸出：

>>> for k,v in result.items(): 
...  print(repr(k),v) 
'' [1] 
'yes' [1] 
'greg' [1] 
'when' [1, 2] 
'17' [1]

來源

2012-02-28 16:20:05

是的，你的錯誤來自於就地修改列表。此外，您的解決方案對長名單不起作用。這是更好地使用字典來代替，並將其轉換回列表結尾：

from collections import defaultdict 
od = defaultdict(list) 

for term, doc_id in index: 
    od[term].append(doc_id) 

result = [[term] + doc_ids for term, doc_ids in od.iteritems()] 

print result 
# [['', 1], ['yes', 1], ['greg', 1], ['when', 1, 2], ['17', 1]]

來源

2012-02-28 16:26:10 DzinX

其實，你可以使用這個和range()已經len()做。然而，Python的美妙之處在於你可以直接迭代列表中沒有索引的元素。

看看這段代碼，並試圖理解。

#!/usr/bin/env python 

def main(): 

    tot_array = \ 
    [ ['when', 1], 
     ['yes', 1], 
     ['', 1], 
     ['greg', 1], 
     ['17', 1], 
     ['when',2] 
    ] 

    for aList1 in tot_array: 
     for aList2 in tot_array: 
      if aList1[0]==aList2[0] and aList1 !=aList2: 
       aList1.append(aList2[1]) 
       tot_array.remove(aList2) 
    print tot_array 

    pass 

if __name__ == '__main__': 
    main()

輸出會看起來像：

*** Remote Interpreter Reinitialized *** 
>>> 
[['when', 1, 2], ['yes', 1], ['', 1], ['greg', 1], ['17', 1]]

來源

2012-02-28 16:56:50 Surya

減少Python列表中的重複項列表

回答

相關問題