Python在列表中找到重複的字典，並用計數將它們分開

我在列表中有一個字典，有些字典是相同的。我想找到重複的，並想要添加到新的列表或字典與他們有多少重複。Python在列表中找到重複的字典，並用計數將它們分開

import itertools 

myListCombined = list() 
for a, b in itertools.combinations(myList, 2): 
    is_equal = set(a.items()) - set(b.items()) 
    if len(is_equal) == 0: 
     a.update(count=2) 
     myListCombined.append(a) 
    else: 
     a.update(count=1) 
     b.update(count=1) 
     myListCombined.append(a) 
     myListCombined.append(b) 

myListCombined = [i for n, i enumerate(myListCombine) if i not in myListCombine[n + 1:]]

此代碼有點不錯，但它只是在列表中的2個重複的字典。 a.update（count = 2）在這種情況下不起作用。我最後一行將它們分開後，也刪除了重複的字典，但我不確定它是否能正常工作。

輸入：

[{'name': 'Mary', 'age': 25, 'salary': 1000}, 
{'name': 'John', 'age': 25, 'salary': 2000}, 
{'name': 'George', 'age': 30, 'salary': 2500}, 
{'name': 'John', 'age': 25, 'salary': 2000}, 
{'name': 'John', 'age': 25, 'salary': 2000}]

所需的輸出：

[{'name': 'Mary', 'age': 25, 'salary': 1000, 'count':1}, 
{'name': 'John', 'age': 25, 'salary': 2000, 'count': 3}, 
{'name': 'George', 'age': 30, 'salary': 2500, 'count' 1}]

來源

2017-08-24 Korhan Yüzbaş

請發表您的輸入和期望的輸出。 – Ajax1234

編輯，謝謝@ Ajax1234 –

請在下面看到我的回覆。 – Ajax1234

你可以嘗試以下方法，它首先每個字典轉換爲關鍵的frozenset，值元（以使它們可哈希的要求由集合。計數器）。

import collections 
a = [{'a':1}, {'a':1},{'b':2}] 
print(collections.Counter(map(lambda x: frozenset(x.items()),a)))

編輯，以反映所需的輸入/輸出：

from copy import deepcopy 

def count_duplicate_dicts(list_of_dicts): 
    cpy = deepcopy(list_of_dicts) 
    for d in list_of_dicts: 
     d['count'] = cpy.count(d) 
    return list_of_dicts 

x = [{'a':1},{'a':1}, {'c':3}] 
print(count_duplicate_dicts(x))

來源

2017-08-24 18:38:05 Solaxun

我stucked當我使用collections.Counter作爲字典不可散列。謝謝你的幫助！所以，因爲冷凍集不是可代換的，我應該使用'dict（frozenset）['salary']'來達到值嗎？ –

可以使用collections.Counter採取的計數值，然後加入從Counter每個frozenset計數值後重建http://stardict.sourceforge.net/Dictionaries.php下載：

from collections import Counter 

l = [dict(d | {('count', c)}) for d, c in Counter(frozenset(d.items()) 
                for d in myList).items()] 
print(l) 
# [{'salary': 1000, 'name': 'Mary', 'age': 25, 'count': 1}, 
# {'name': 'John', 'salary': 2000, 'age': 25, 'count': 3}, 
# {'salary': 2500, 'name': 'George', 'age': 30, 'count': 1}]

來源

2017-08-24 18:58:31

如果你的詞典數據結構良好，而且詞典的內容是簡單的數據類型，例如數字和字符串，並且您有以下數據分析處理，我建議您使用提供豐富功能的熊貓。這裏是您的情況下的示例代碼：

In [32]: data = [{'name': 'Mary', 'age': 25, 'salary': 1000}, 
    ...: {'name': 'John', 'age': 25, 'salary': 2000}, 
    ...: {'name': 'George', 'age': 30, 'salary': 2500}, 
    ...: {'name': 'John', 'age': 25, 'salary': 2000}, 
    ...: {'name': 'John', 'age': 25, 'salary': 2000}] 
    ...: 
    ...: df = pd.DataFrame(data) 
    ...: df['counts'] = 1 
    ...: df = df.groupby(df.columns.tolist()[:-1]).sum().reset_index(drop=False) 
    ...: 

In [33]: df 
Out[33]: 
    age name salary counts 
0 25 John 2000  3 
1 25 Mary 1000  1 
2 30 George 2500  1 

In [34]: df.to_dict(orient='records') 
Out[34]: 
[{'age': 25, 'counts': 3, 'name': 'John', 'salary': 2000}, 
{'age': 25, 'counts': 1, 'name': 'Mary', 'salary': 1000}, 
{'age': 30, 'counts': 1, 'name': 'George', 'salary': 2500}]

的邏輯是：

（1）首先從數據建立數據幀

（2）GROUPBY功能可以做在每個聚合函數組。

（3）輸出回到快譯通，你可以叫pd.to_dict

大熊貓是個大包，花費一些時間來學習它，但它實在值得了解大熊貓。它非常強大，可以使您的數據分析更加快速和優雅。

謝謝。

來源

2017-08-24 19:04:04 rojeeer

你可以試試這個：

import collections 

d = [{'name': 'Mary', 'age': 25, 'salary': 1000}, 
{'name': 'John', 'age': 25, 'salary': 2000}, 
{'name': 'George', 'age': 30, 'salary': 2500}, 
{'name': 'John', 'age': 25, 'salary': 2000}, 
{'name': 'John', 'age': 25, 'salary': 2000}] 

count = dict(collections.Counter([i["name"] for i in d])) 
a = list(set(map(tuple, [i.items() for i in d]))) 
final_dict = [dict(list(i)+[("count", count[dict(i)["name"]])]) for i in a]

輸出：

[{'salary': 2000, 'count': 3, 'age': 25, 'name': 'John'}, {'salary': 2500, 'count': 1, 'age': 30, 'name': 'George'}, {'salary': 1000, 'count': 1, 'age': 25, 'name': 'Mary'}]

來源

2017-08-24 19:07:09 Ajax1234

Python在列表中找到重複的字典，並用計數將它們分開

回答

相關問題