迭代通過與條件字典列表

假設這其中test是字典的大列表（這只是一個示例）：迭代通過與條件字典列表

test = [ 
{'alignedWord': 'welcome', 
    'case': 'success', 
    'end': 0.9400000000000001, 
    'start': 0.56 
    'word': 'Welcome'}, 

{'alignedWord': 'to', 
    'case': 'success', 
    'end': 1.01, 
    'start': 0.94, 
    'word': 'to'}, 

{'alignedWord': 'story', 
    'case': 'not-found-in-audio', 
    'word': 'Story'}, 

{'alignedWord': 'in', 
    'case': 'success', 
    'end': 1.4100000000000001, 
    'start': 1.34, 
    'word': 'in'}, 

{'alignedWord': 'a', 
    'case': 'success', 
    'end': 1.44, 
    'start': 1.41, 
    'word': 'a'}, 

{'alignedWord': 'bottle', 
    'case': 'success', 
    'end': 1.78, 
    'start': 1.44, 
    'word': 'Bottle'} ]

輸出作爲箱子`每個連續組塊的JSON文件==」成功和duration_s < 10：

Output: 

{"text": "Welcome to", "duration_s": 0.45} 
{"text": "in a bottle", "duration_s': 0.44}

duration = ('end' - 'start') #of the text

來源

2017-04-14 MathHeat44

如果你想要某人引導你完成某些事情，Stack Overflow不是尋找它的地方。指導你完成某些事情需要在這種格式中進行過多的來回交互;堆棧溢出更多的是「詢問具體的，重點突出的問題，得到答案，交互結束」。 – user2357112

因此，請嘗試執行上面的僞代碼，並在遇到*特定*問題時回到我們這裏。 – blacksite

歡迎來到SO。這是比大多數新用戶發佈更好的問題，所以不要感覺不好。我對如何編輯你的問題的建議：提供代碼給出的輸出，並提供你希望輸出的內容。那麼，既然你已經給了我們字典列表，人們可以嘗試代碼並確認他們已經得到你想要的輸出。當我們獲得您的數據和期望的上下文輸出時，還需要更少的文本來解釋您所需的邏輯。 –

我在列表中的TE的中間增加了一個新的字典沒有start和end鍵st，現在爲你工作嗎？我澄清說，我也改變了持續時間。

from collections import OrderedDict 

# add 'duration' var to dicts (makes code in loop clearer) 
for dict_ in list_of_dicts: 
    try: 
    dict_.update({'duration': dict_['end'] - dict_['start']}) 
    except KeyError: 
    dict_['duration'] = 999 


# initialize result_dict with keys we'll add to 
rolling_duration = 0 
result_dict = OrderedDict([('text', ''), ('duration', 0)]) 

# looping directly through objects as mentioned in comments 
for dict_ in list_of_dicts: 
    rolling_duration = rolling_duration + dict_['duration'] 
    #print(dict_['word'], dict_['duration'], rolling_duration) 

    if dict_['case'] == 'success' and rolling_duration < 10: 
    result_dict['text'] = (result_dict['text'] + " " + dict_['word']).lstrip() 
    result_dict['duration'] = round(rolling_duration, 2) 

    # print accrued results and reset dict/rolling duration 
    else: 
    if result_dict['text'] != '': 
     print(json.dumps(result_dict)) 
    result_dict = OrderedDict([('text', ''), ('duration', 0)]) 
    rolling_duration = 0 

# print final json result_dict after exiting loop 
print(json.dumps(result_dict))

{"text": "Welcome to", "duration": 0.45}

{"text": "in a Bottle","duration": 0.44}

來源

2017-04-15 00:41:46

這是一個很好的開始謝謝你。當我的名單超過155項時，它給了我一個錯誤。它適用於我提供的樣本集。此外，我不認爲我解釋得很好，持續時間必須爲整個文本塊，小於10，如果文本塊超過10我想在相同的條件下開始一個新的塊。 – MathHeat44

'回溯（最近一次通話最後）：文件「/ Users/TracyShields/Scribie/Podcast-Data/new_align。PY」 22行，在 dict_.update（{ '時間'：字典_ [ '端'] - 字典_ [ '開始']}） KeyError異常：「end'' 我得到這個錯誤時，我的列表 – MathHeat44

你的第二點是一個簡單的修改，只需跟蹤一個'rolling_duration'變量（不包括任何字典），並將其包含在'if'條件的條件中。 –

這可能是解決使用產生的需求最終字典發電機：

def split(it): 
    it = iter(it) 
    acc, duration = [], 0 # defaults 
    for item in it: 
     if item['case'] != 'success': # split when there's a non-success 
      if acc: 
       yield {'text': ' '.join(acc), 'duration': duration} 
       acc, duration = [], 0 # reset defaults 

     else: 
      tmp_duration = item['end'] - item['start'] 

      if tmp_duration + duration >= 10: # split when the duration is too long 
       if acc: 
        yield {'text': ' '.join(acc), 'duration': duration} 
       acc, duration = [item['word']], tmp_duration # new defaults 

      else: 
       acc.append(item['word']) 
       duration += tmp_duration 

    if acc: # give the remaining items 
     yield {'text': ' '.join(acc), 'duration': duration}

一個簡單的測試，得出：

>>> list(split(test)) 
[{'duration': 0.45000000000000007, 'text': 'Welcome to'}, 
{'duration': 0.44000000000000017, 'text': 'in a Bottle'}]

這樣就可以很容易地甩到JSON文件：

>>> import json 
>>> json.dumps(list(split(test))) 
'[{"text": "Welcome to", "duration": 0.45000000000000007}, {"text": "in a Bottle", "duration": 0.44000000000000017}]'

來源

2017-04-15 02:35:31 MSeifert

迭代通過與條件字典列表

回答

相關問題