2016-12-31 58 views
4

在閱讀python文檔時,我遇到了itertools.groupby() 函數。這不是非常簡單,所以我決定在這裏查找一些信息在stackoverflow上。我從How do I use Python's itertools.groupby()?發現了一些東西。itertools.groupby()用於什麼?

在這裏和文檔中似乎很少有關於它的信息,所以我決定發佈我的觀察意見。

感謝

+0

你有沒有檢查['grouby()'文件(https://docs.python.org/2/library/itertools的.html#itertools.groupby)?哪部分不是直截了當的? –

+0

@MoinuddinQuadri OP的問題的第一句說明他們閱讀Python文檔。 –

+0

你問一個問題,你準備了一個詳細的答案?真?爲什麼不把問題中的所有問題都留下來,並留下答案部分供討論? –

回答

6

首先,你可以閱讀文檔here

我會把我認爲最重要的一點放在第一位。我希望通過這些例子,理由會變得清晰。

總是稍稍使用相同的密鑰項用於SO GROUPING,以避免意外的結果

itertools.groupby(iterable, key=None or some func) 需要根據指定的鍵iterables和團體它們的列表。該鍵指定要對每個單獨的迭代應用什麼操作,然後將結果用作每個項目分組的標題;結果具有相同「關鍵」值的項目將最終在同一組中。

返回值類似於字典,它的形式爲{key : value}

實施例1

# note here that the tuple counts as one item in this list. I did not 
# specify any key, so each item in the list is a key on its own. 
c = groupby(['goat', 'dog', 'cow', 1, 1, 2, 3, 11, 10, ('persons', 'man', 'woman')]) 
dic = {} 
for k, v in c: 
    dic[k] = list(v) 
dic 

導致

{1: [1, 1], 
'goat': ['goat'], 
3: [3], 
'cow': ['cow'], 
('persons', 'man', 'woman'): [('persons', 'man', 'woman')], 
10: [10], 
11: [11], 
2: [2], 
'dog': ['dog']} 

實施例2

# notice here that mulato and camel don't show up. only the last element with a certain key shows up, like replacing earlier result 
# the last result for c actually wipes out two previous results. 

list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \ 
       'wombat', 'mongoose', 'malloo', 'camel'] 
c = groupby(list_things, key=lambda x: x[0]) 
dic = {} 
for k, v in c: 
    dic[k] = list(v) 
dic 

導致

{'c': ['camel'], 
'd': ['dog', 'donkey'], 
'g': ['goat'], 
'm': ['mongoose', 'malloo'], 
'persons': [('persons', 'man', 'woman')], 
'w': ['wombat']} 

現在對於排序的版本

# but observe the sorted version where I have the data sorted first on same key I used for grouping 
list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \ 
       'wombat', 'mongoose', 'malloo', 'camel'] 
sorted_list = sorted(list_things, key = lambda x: x[0]) 
print(sorted_list) 
print() 
c = groupby(sorted_list, key=lambda x: x[0]) 
dic = {} 
for k, v in c: 
    dic[k] = list(v) 
dic 

導致

['cow', 'cat', 'camel', 'dog', 'donkey', 'goat', 'mulato', 'mongoose', 'malloo', ('persons', 'man', 'woman'), 'wombat'] 
{'c': ['cow', 'cat', 'camel'], 
'd': ['dog', 'donkey'], 
'g': ['goat'], 
'm': ['mulato', 'mongoose', 'malloo'], 
'persons': [('persons', 'man', 'woman')], 
'w': ['wombat']} 

實施例3

things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "harley"), \ 
      ("vehicle", "speed boat"), ("vehicle", "school bus")] 
dic = {} 
f = lambda x: x[0] 
for key, group in groupby(sorted(things, key=f), f): 
    dic[key] = list(group) 
dic 

導致

{'animal': [('animal', 'bear'), ('animal', 'duck')], 
'plant': [('plant', 'cactus')], 
'vehicle': [('vehicle', 'harley'), 
    ('vehicle', 'speed boat'), 
    ('vehicle', 'school bus')]} 

現在對於排序的版本。我在這裏將元組更改爲列表。無論哪種方式,結果都一樣

things = [["animal", "bear"], ["animal", "duck"], ["vehicle", "harley"], ["plant", "cactus"], \ 
      ["vehicle", "speed boat"], ["vehicle", "school bus"]] 
dic = {} 
f = lambda x: x[0] 
for key, group in groupby(sorted(things, key=f), f): 
    dic[key] = list(group) 
dic 

結果

{'animal': [['animal', 'bear'], ['animal', 'duck']], 
'plant': [['plant', 'cactus']], 
'vehicle': [['vehicle', 'harley'], 
    ['vehicle', 'speed boat'], 
    ['vehicle', 'school bus']]} 
+0

「'itertools.groupby(iterable,key = None或者某些func)'需要一個可迭代列表」是否需要一個迭代列表,或者只是一個迭代?列表是可迭代的。 – Tagc

+0

文檔沒有明確說明。但從我發佈的示例中,可以看到我使用了列表和嵌套列表。所以它可以採用「迭代」(例1)以及「迭代列表」(例2)。你甚至可以通過一個字符串,你仍然在業務 – Parousia

2

一如既往的documentation of the function應該是首先要檢查的地方。然而itertools.groupby肯定是最棘手的itertools之一,因爲它有一些可能存在的缺陷:

  • 它只羣體的項目,如果他們key -result是連續的項目相同:

    from itertools import groupby 
    
    for key, group in groupby([1,1,1,1,5,1,1,1,1,4]): 
        print(key, list(group)) 
    # 1 [1, 1, 1, 1] 
    # 5 [5] 
    # 1 [1, 1, 1, 1] 
    # 4 [4] 
    

    人們可以使用之前的sorted - 如果有人想整體做groupby

  • 它產生兩個項目,第二個是一個生成器(所以需要迭代第二個項目!)。在前面的例子中,我明確需要將它們轉換爲list

  • 如果一個前進groupby - 迭代器的第二產生元件被丟棄:

    it = groupby([1,1,1,1,5,1,1,1,1,4]) 
    key1, group1 = next(it) 
    key2, group2 = next(it) 
    print(key1, list(group1)) 
    # 1 [] 
    

    即使group1不爲空!

前面已經提到可以使用sorted做一個整體groupby操作,但是這是低效的極端(如果你想在發電機使用GROUPBY扔掉內存效率)。有繳費更好的替代品,如果你不能garantuee輸入是sorted(也不需要O(n log(n)分揀時的開銷):

但是,檢查本地屬性是很好的。有兩種配方在itertools-recipes section

def all_equal(iterable): 
    "Returns True if all the elements are equal to each other" 
    g = groupby(iterable) 
    return next(g, True) and not next(g, False) 

和:

def unique_justseen(iterable, key=None): 
    "List unique elements, preserving order. Remember only the element just seen." 
    # unique_justseen('AAAABBBCCDAABBB') --> A B C D A B 
    # unique_justseen('ABBCcAD', str.lower) --> A B C A D 
    return map(next, map(itemgetter(1), groupby(iterable, key))) 
+0

謝謝。如果我需要某些替代品,我一定會留意。現在我正在逐節閱讀文檔,以免混淆一切。祝你新年快樂 – Parousia