itertools.groupby（）用於什麼？

在閱讀python文檔時，我遇到了itertools.groupby() 函數。這不是非常簡單，所以我決定在這裏查找一些信息在stackoverflow上。我從How do I use Python's itertools.groupby()?發現了一些東西。itertools.groupby（）用於什麼？

在這裏和文檔中似乎很少有關於它的信息，所以我決定發佈我的觀察意見。

感謝

來源

2016-12-31 Parousia

你有沒有檢查['grouby（）'文件（https://docs.python.org/2/library/itertools的.html＃itertools.groupby）？哪部分不是直截了當的？ –

@MoinuddinQuadri OP的問題的第一句說明他們閱讀Python文檔。 –

你問一個問題，你準備了一個詳細的答案？真？爲什麼不把問題中的所有問題都留下來，並留下答案部分供討論？ –

首先，你可以閱讀文檔here。

我會把我認爲最重要的一點放在第一位。我希望通過這些例子，理由會變得清晰。

總是稍稍使用相同的密鑰項用於SO GROUPING，以避免意外的結果

itertools.groupby(iterable, key=None or some func) 需要根據指定的鍵iterables和團體它們的列表。該鍵指定要對每個單獨的迭代應用什麼操作，然後將結果用作每個項目分組的標題;結果具有相同「關鍵」值的項目將最終在同一組中。

返回值類似於字典，它的形式爲{key : value}。

實施例1

# note here that the tuple counts as one item in this list. I did not 
# specify any key, so each item in the list is a key on its own. 
c = groupby(['goat', 'dog', 'cow', 1, 1, 2, 3, 11, 10, ('persons', 'man', 'woman')]) 
dic = {} 
for k, v in c: 
    dic[k] = list(v) 
dic

導致

{1: [1, 1], 
'goat': ['goat'], 
3: [3], 
'cow': ['cow'], 
('persons', 'man', 'woman'): [('persons', 'man', 'woman')], 
10: [10], 
11: [11], 
2: [2], 
'dog': ['dog']}

實施例2

# notice here that mulato and camel don't show up. only the last element with a certain key shows up, like replacing earlier result 
# the last result for c actually wipes out two previous results. 

list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \ 
       'wombat', 'mongoose', 'malloo', 'camel'] 
c = groupby(list_things, key=lambda x: x[0]) 
dic = {} 
for k, v in c: 
    dic[k] = list(v) 
dic

導致

{'c': ['camel'], 
'd': ['dog', 'donkey'], 
'g': ['goat'], 
'm': ['mongoose', 'malloo'], 
'persons': [('persons', 'man', 'woman')], 
'w': ['wombat']}

現在對於排序的版本

# but observe the sorted version where I have the data sorted first on same key I used for grouping 
list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \ 
       'wombat', 'mongoose', 'malloo', 'camel'] 
sorted_list = sorted(list_things, key = lambda x: x[0]) 
print(sorted_list) 
print() 
c = groupby(sorted_list, key=lambda x: x[0]) 
dic = {} 
for k, v in c: 
    dic[k] = list(v) 
dic

導致

['cow', 'cat', 'camel', 'dog', 'donkey', 'goat', 'mulato', 'mongoose', 'malloo', ('persons', 'man', 'woman'), 'wombat'] 
{'c': ['cow', 'cat', 'camel'], 
'd': ['dog', 'donkey'], 
'g': ['goat'], 
'm': ['mulato', 'mongoose', 'malloo'], 
'persons': [('persons', 'man', 'woman')], 
'w': ['wombat']}

實施例3

things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "harley"), \ 
      ("vehicle", "speed boat"), ("vehicle", "school bus")] 
dic = {} 
f = lambda x: x[0] 
for key, group in groupby(sorted(things, key=f), f): 
    dic[key] = list(group) 
dic

導致

{'animal': [('animal', 'bear'), ('animal', 'duck')], 
'plant': [('plant', 'cactus')], 
'vehicle': [('vehicle', 'harley'), 
    ('vehicle', 'speed boat'), 
    ('vehicle', 'school bus')]}

現在對於排序的版本。我在這裏將元組更改爲列表。無論哪種方式，結果都一樣

things = [["animal", "bear"], ["animal", "duck"], ["vehicle", "harley"], ["plant", "cactus"], \ 
      ["vehicle", "speed boat"], ["vehicle", "school bus"]] 
dic = {} 
f = lambda x: x[0] 
for key, group in groupby(sorted(things, key=f), f): 
    dic[key] = list(group) 
dic

結果

{'animal': [['animal', 'bear'], ['animal', 'duck']], 
'plant': [['plant', 'cactus']], 
'vehicle': [['vehicle', 'harley'], 
    ['vehicle', 'speed boat'], 
    ['vehicle', 'school bus']]}

來源

2016-12-31 20:26:04 Parousia

「'itertools.groupby（iterable，key = None或者某些func）'需要一個可迭代列表」是否需要一個迭代列表，或者只是一個迭代？列表是可迭代的。 – Tagc

文檔沒有明確說明。但從我發佈的示例中，可以看到我使用了列表和嵌套列表。所以它可以採用「迭代」（例1）以及「迭代列表」（例2）。你甚至可以通過一個字符串，你仍然在業務 – Parousia

一如既往的documentation of the function應該是首先要檢查的地方。然而itertools.groupby肯定是最棘手的itertools之一，因爲它有一些可能存在的缺陷：

它只羣體的項目，如果他們key -result是連續的項目相同：

from itertools import groupby 

for key, group in groupby([1,1,1,1,5,1,1,1,1,4]): 
    print(key, list(group)) 
# 1 [1, 1, 1, 1] 
# 5 [5] 
# 1 [1, 1, 1, 1] 
# 4 [4]

人們可以使用之前的sorted - 如果有人想整體做groupby。

它產生兩個項目，第二個是一個生成器（所以需要迭代第二個項目！）。在前面的例子中，我明確需要將它們轉換爲list。

如果一個前進groupby - 迭代器的第二產生元件被丟棄：

it = groupby([1,1,1,1,5,1,1,1,1,4]) 
key1, group1 = next(it) 
key2, group2 = next(it) 
print(key1, list(group1)) 
# 1 []

即使group1不爲空！

前面已經提到可以使用sorted做一個整體groupby操作，但是這是低效的極端（如果你想在發電機使用GROUPBY扔掉內存效率）。有繳費更好的替代品，如果你不能garantuee輸入是sorted（也不需要O(n log(n)分揀時的開銷）：

但是，檢查本地屬性是很好的。有兩種配方在itertools-recipes section：

def all_equal(iterable): 
    "Returns True if all the elements are equal to each other" 
    g = groupby(iterable) 
    return next(g, True) and not next(g, False)

和：

def unique_justseen(iterable, key=None): 
    "List unique elements, preserving order. Remember only the element just seen." 
    # unique_justseen('AAAABBBCCDAABBB') --> A B C D A B 
    # unique_justseen('ABBCcAD', str.lower) --> A B C A D 
    return map(next, map(itemgetter(1), groupby(iterable, key)))

來源

2016-12-31 21:06:32 MSeifert

謝謝。如果我需要某些替代品，我一定會留意。現在我正在逐節閱讀文檔，以免混淆一切。祝你新年快樂 – Parousia

itertools.groupby（）用於什麼？

回答

相關問題