從類型的字典列表近乎重複的值刪除類型的字典 - Python的

我想清理類型的字典列表，按以下規則：從類型的字典列表近乎重複的值刪除類型的字典 - Python的

1）類型的字典的列表已經排序，因此早期字典是優選的。
2）在較低的字符中，如果['name']和['code']字符串值與列表上的任何字典的相同鍵值相匹配，並且這兩個字符之間的int(['cost'])的差值的絕對值是< 2;那麼該字典被認爲是早期字典的副本，並從列表中刪除。

以下是類型的字典列表中選擇一個字典：

{ 
'name':"ItemName", 
'code':"AAHFGW4S", 
'from':"NDLS", 
'to':"BCT", 
'cost':str(29.95) 
}

什麼是刪除這樣的重複的最佳方式？

來源

2011-04-14 Pranab

可能有這樣做的更pythonic的方法，但是這是基本的僞代碼：

def is_duplicate(a,b): 
    if a['name'] == b['name'] and a['cost'] == b['cost'] and abs(int(a['cost']-b['cost'])) < 2: 
    return True 
    return False 

newlist = [] 
for a in oldlist: 
    isdupe = False 
    for b in newlist: 
    if is_duplicate(a,b): 
     isdupe = True 
     break 
    if not isdupe: 
    newlist.append(a)

來源

2011-04-14 19:26:10 Yasser

雖然有更好的技術方法（尤其是Jochen的答案，它使用'yield'來減少大型列表中的內存使用量），但我更喜歡您的方法的可讀性。 – Pranab 2011-04-17 20:59:23

樣的一個令人費解的問題，但我認爲像這樣的工作：

for i, d in enumerate(dictList): 
    # iterate through the list of dicts, starting with the first 
    for k,v in d.iteritems(): 
     # for each key-value pair in this dict... 
     for d2 in dictList[i:]: 
      # check against all of the other dicts "beneath" it 
      # eg, 
      # if d['name'] == d2['name'] and d['code'] == d2['code']: 
      #  --check the cost stuff here--

來源

2011-04-14 19:25:18

謝謝丹尼爾。是的，剪掉這個列表也是我第一個本能的想法，但是我認爲Yasser使用兩個列表的想法在一年後重新訪問代碼時最終會變得更加清晰。你怎麼看？ – Pranab 2011-04-15 08:28:17

由於你說的成本是你可以使用的整數：

def neardup(items): 
    forbidden = set() 
    for elem in items: 
     key = elem['name'], elem['code'], int(elem['cost']) 
     if key not in forbidden: 
      yield elem 
      for diff in (-1,0,1): # add all keys invalidated by this 
       key = elem['name'], elem['code'], int(elem['cost'])-diff 
       forbidden.add(key)

這是一個不那麼棘手的方式，r eally計算差異：

from collections import defaultdict 
def neardup2(items): 
    # this is a mapping `(name, code) -> [cost1, cost2, ... ]` 
    forbidden = defaultdict(list) 
    for elem in items: 
     key = elem['name'], elem['code'] 
     curcost = float(elem['cost']) 
     # a item is new if we never saw the key before 
     if (key not in forbidden or 
       # or if all the known costs differ by more than 2 
       all(abs(cost-curcost) >= 2 for cost in forbidden[key])): 
      yield elem 
      forbidden[key].append(curcost)

這兩種解決方案都避免重新掃描每個項目的整個列表。畢竟，如果(name, code)是平等的，成本纔會變得有趣，因此您可以使用字典快速查找所有候選項。

來源

2011-04-14 19:46:03

感謝您向我介紹'yield'和'set（）'。你的答案在技術上總是很棒！ – Pranab 2011-04-15 08:30:39

從類型的字典列表近乎重複的值刪除類型的字典 - Python的

回答

相關問題