在for循環中分組數據

我需要遍歷已排序的數據集，將所有排序屬性的結果分組爲所有具有相同屬性值的塊。然後我對這些結果進行一些操作。在for循環中分組數據

對不起，這是一個有點混亂，例子有可能說明我在做什麼更好的辦法：

我已經得到了真實的結構是這樣，除了「數據」的數據集的字符串實際上是對象和含有大量的其他數據。

[ [1, "data1"], [1, "data2"], [2, "moredata"], [2, "stuff"], 
    [2, "things"], [2, "foo"], [3, "bar"], [4, "baz"] ]

我希望發生的是該數據得到分成4個不同的函數調用：

process_data(1, ["data1", "data2"]) 
process_data(2, ["moredata", "stuff", "things", "foo"]) 
process_data(3, ["bar"]) 
process_data(4, ["baz"])

我最終得到的是一個結構，它看起來是這樣的：

last_id = None 
grouped_data = [] 

for row in dataset: 
    id = row[0] 
    data = row[1] 

    if last_id != id: 
     # we're starting a new group, process the last group 
     processs_data(last_id, grouped_data) 
     grouped_data = [] 
    last_id = id 
    grouped_data.append(data) 

if grouped_data: 
    # we're done the loop and we still have a last group of data to process 
    # if there was no data in the dataset, grouped_data will still be empty 
    # so we won't accidentally process any empty data. 
    process_data(last_id, grouped_data)

它的作品，但它似乎笨拙。尤其需要跟蹤last_id變量的所有內容，以及循環後第二次調用process_data。我只想知道是否有人可以提供任何建議，以獲得更優雅/更聰明的解決方案。

我選擇的語言是Python，但一般的解決方案是好的。

來源

2012-08-06 cecilkorik

itertools.groupby就是你想要什麼：

>>> data = [ [1, "data1"], [1, "data2"], [2, "moredata"], [2, "stuff"], 
... [2, "things"], [2, "foo"], [3, "bar"], [4, "baz"] ] 
>>> 
>>> from itertools import groupby 
>>> from operator import itemgetter 
>>> 
>>> def process_data(key, keydata): 
...  print key, ':', keydata 
... 
>>> for key,keydata in groupby(data, key=itemgetter(0)): 
... process_data(key, [d[1] for d in keydata]) 
... 
1 : ['data1', 'data2'] 
2 : ['moredata', 'stuff', 'things', 'foo'] 
3 : ['bar'] 
4 : ['baz']

通GROUPBY上排序列表，以及一個關鍵的功能如何通過組列表中的每個項目中。您將找回(key,itemgenerator)對的生成器，如圖所示，將傳遞給我製作的process_data方法。

來源

2012-08-06 06:21:20 PaulMcG

這很完美。與往常一樣，Python隨附了所有電池，只是找到它們的位置。感謝您指引我在這個正確的方向！ – cecilkorik 2012-08-06 16:40:57

看一看itertools.groupby。請注意，這要求你的列表已經根據組密鑰進行排序（你的示例數據是這樣的，所以我猜它沒問題）。

來源

2012-08-06 06:16:57 BrenBarn

您可以使用MutliDict，例如brownie或werkzeug包。

from brownie.datastructures import MultiDict 
data = [ [1, "data1"], [1, "data2"], [2, "moredata"], [2, "stuff"], 
     [2, "things"], [2, "foo"], [3, "bar"], [4, "baz"] ] 
for key, keydata in MultiDict(data).iterlists(): 
    process_data(key, keydata)

來源

2012-08-06 06:24:01

在for循環中分組數據

回答

相關問題