在閱讀python文檔時,我遇到了itertools.groupby()
函數。這不是非常簡單,所以我決定在這裏查找一些信息在stackoverflow上。我從How do I use Python's itertools.groupby()?發現了一些東西。itertools.groupby()用於什麼?
在這裏和文檔中似乎很少有關於它的信息,所以我決定發佈我的觀察意見。
感謝
在閱讀python文檔時,我遇到了itertools.groupby()
函數。這不是非常簡單,所以我決定在這裏查找一些信息在stackoverflow上。我從How do I use Python's itertools.groupby()?發現了一些東西。itertools.groupby()用於什麼?
在這裏和文檔中似乎很少有關於它的信息,所以我決定發佈我的觀察意見。
感謝
首先,你可以閱讀文檔here。
我會把我認爲最重要的一點放在第一位。我希望通過這些例子,理由會變得清晰。
總是稍稍使用相同的密鑰項用於SO GROUPING,以避免意外的結果
itertools.groupby(iterable, key=None or some func)
需要根據指定的鍵iterables和團體它們的列表。該鍵指定要對每個單獨的迭代應用什麼操作,然後將結果用作每個項目分組的標題;結果具有相同「關鍵」值的項目將最終在同一組中。
返回值類似於字典,它的形式爲{key : value}
。
實施例1
# note here that the tuple counts as one item in this list. I did not
# specify any key, so each item in the list is a key on its own.
c = groupby(['goat', 'dog', 'cow', 1, 1, 2, 3, 11, 10, ('persons', 'man', 'woman')])
dic = {}
for k, v in c:
dic[k] = list(v)
dic
導致
{1: [1, 1],
'goat': ['goat'],
3: [3],
'cow': ['cow'],
('persons', 'man', 'woman'): [('persons', 'man', 'woman')],
10: [10],
11: [11],
2: [2],
'dog': ['dog']}
實施例2
# notice here that mulato and camel don't show up. only the last element with a certain key shows up, like replacing earlier result
# the last result for c actually wipes out two previous results.
list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \
'wombat', 'mongoose', 'malloo', 'camel']
c = groupby(list_things, key=lambda x: x[0])
dic = {}
for k, v in c:
dic[k] = list(v)
dic
導致
{'c': ['camel'],
'd': ['dog', 'donkey'],
'g': ['goat'],
'm': ['mongoose', 'malloo'],
'persons': [('persons', 'man', 'woman')],
'w': ['wombat']}
現在對於排序的版本
# but observe the sorted version where I have the data sorted first on same key I used for grouping
list_things = ['goat', 'dog', 'donkey', 'mulato', 'cow', 'cat', ('persons', 'man', 'woman'), \
'wombat', 'mongoose', 'malloo', 'camel']
sorted_list = sorted(list_things, key = lambda x: x[0])
print(sorted_list)
print()
c = groupby(sorted_list, key=lambda x: x[0])
dic = {}
for k, v in c:
dic[k] = list(v)
dic
導致
['cow', 'cat', 'camel', 'dog', 'donkey', 'goat', 'mulato', 'mongoose', 'malloo', ('persons', 'man', 'woman'), 'wombat']
{'c': ['cow', 'cat', 'camel'],
'd': ['dog', 'donkey'],
'g': ['goat'],
'm': ['mulato', 'mongoose', 'malloo'],
'persons': [('persons', 'man', 'woman')],
'w': ['wombat']}
實施例3
things = [("animal", "bear"), ("animal", "duck"), ("plant", "cactus"), ("vehicle", "harley"), \
("vehicle", "speed boat"), ("vehicle", "school bus")]
dic = {}
f = lambda x: x[0]
for key, group in groupby(sorted(things, key=f), f):
dic[key] = list(group)
dic
導致
{'animal': [('animal', 'bear'), ('animal', 'duck')],
'plant': [('plant', 'cactus')],
'vehicle': [('vehicle', 'harley'),
('vehicle', 'speed boat'),
('vehicle', 'school bus')]}
現在對於排序的版本。我在這裏將元組更改爲列表。無論哪種方式,結果都一樣
things = [["animal", "bear"], ["animal", "duck"], ["vehicle", "harley"], ["plant", "cactus"], \
["vehicle", "speed boat"], ["vehicle", "school bus"]]
dic = {}
f = lambda x: x[0]
for key, group in groupby(sorted(things, key=f), f):
dic[key] = list(group)
dic
結果
{'animal': [['animal', 'bear'], ['animal', 'duck']],
'plant': [['plant', 'cactus']],
'vehicle': [['vehicle', 'harley'],
['vehicle', 'speed boat'],
['vehicle', 'school bus']]}
一如既往的documentation of the function應該是首先要檢查的地方。然而itertools.groupby
肯定是最棘手的itertools
之一,因爲它有一些可能存在的缺陷:
它只羣體的項目,如果他們key
-result是連續的項目相同:
from itertools import groupby
for key, group in groupby([1,1,1,1,5,1,1,1,1,4]):
print(key, list(group))
# 1 [1, 1, 1, 1]
# 5 [5]
# 1 [1, 1, 1, 1]
# 4 [4]
人們可以使用之前的sorted
- 如果有人想整體做groupby
。
它產生兩個項目,第二個是一個生成器(所以需要迭代第二個項目!)。在前面的例子中,我明確需要將它們轉換爲list
。
如果一個前進groupby
- 迭代器的第二產生元件被丟棄:
it = groupby([1,1,1,1,5,1,1,1,1,4])
key1, group1 = next(it)
key2, group2 = next(it)
print(key1, list(group1))
# 1 []
即使group1
不爲空!
前面已經提到可以使用sorted
做一個整體groupby
操作,但是這是低效的極端(如果你想在發電機使用GROUPBY扔掉內存效率)。有繳費更好的替代品,如果你不能garantuee輸入是sorted
(也不需要O(n log(n)
分揀時的開銷):
但是,檢查本地屬性是很好的。有兩種配方在itertools
-recipes section:
def all_equal(iterable):
"Returns True if all the elements are equal to each other"
g = groupby(iterable)
return next(g, True) and not next(g, False)
和:
def unique_justseen(iterable, key=None):
"List unique elements, preserving order. Remember only the element just seen."
# unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
# unique_justseen('ABBCcAD', str.lower) --> A B C A D
return map(next, map(itemgetter(1), groupby(iterable, key)))
謝謝。如果我需要某些替代品,我一定會留意。現在我正在逐節閱讀文檔,以免混淆一切。祝你新年快樂 – Parousia
你有沒有檢查['grouby()'文件(https://docs.python.org/2/library/itertools的.html#itertools.groupby)?哪部分不是直截了當的? –
@MoinuddinQuadri OP的問題的第一句說明他們閱讀Python文檔。 –
你問一個問題,你準備了一個詳細的答案?真?爲什麼不把問題中的所有問題都留下來,並留下答案部分供討論? –