如何迭代Python中的defaultdict（列表）？

如何迭代Python中的defaultdict（list）？是否有更好的方式在Python中擁有一個列表字典？我試過正常iter(dict)，但我得到了錯誤：如何迭代Python中的defaultdict（列表）？

>>> import para 
>>> para.print_doc('./sentseg_en/essentials.txt') 
Traceback (most recent call last): 
    File "<stdin>", line 1, in <module> 
    File "para.py", line 31, in print_doc 
    for para in iter(doc): 
TypeError: iteration over non-sequence

主類：

import para 
para.print_doc('./foo/bar/para-lines.txt')

的para.pyc：

# -*- coding: utf-8 -*- 
## Modified paragraph into a defaultdict(list) structure 
## Original code from http://code.activestate.com/recipes/66063/ 
from collections import defaultdict 
class Paragraphs: 
    import sys 
    reload(sys) 
    sys.setdefaultencoding('utf-8') 
    # Separator here refers to the paragraph seperator, 
    # the default separator is '\n'. 
    def __init__(self, filename, separator=None): 
     # Set separator if passed into object's parameter, 
     # else set default separator as '\n' 
     if separator is None: 
      def separator(line): return line == '\n' 
     elif not callable(separator): 
      raise TypeError, "separator argument must be callable" 
     self.separator = separator 
     # Reading lines from files into a dictionary of lists 
     self.doc = defaultdict(list) 
     paraIndex = 0 
     with open(filename) as readFile: 
      for line in readFile: 
       if line == separator: 
        paraIndex+=1 
       else: 
        self.doc[paraIndex].append(line) 

# Prints out populated doc from txtfile 
def print_doc(filename): 
    text = Paragraphs(filename) 
    for para in iter(text.doc): 
     for sent in text.doc[para]: 
      print "Para#%d, Sent#%d: %s" % (
       para, text.doc[para].index(sent), sent)

的如的./foo/bar/para-lines.txt看起來像這樣：

This is a start of a paragraph. 
foo barr 
bar foo 
foo foo 
This is the end. 

This is the start of next para. 
foo boo bar bar 
this is the end.

主類的輸出應該是這樣的：

Para#1,Sent#1: This is a start of a paragraph. 
Para#1,Sent#2: foo barr 
Para#1,Sent#3: bar foo 
Para#1,Sent#4: foo foo 
Para#1,Sent#5: This is the end. 

Para#2,Sent#1: This is the start of next para. 
Para#2,Sent#2: foo boo bar bar 
Para#2,Sent#3: this is the end.

來源

2011-12-27 alvas

你鏈接到的食譜是相當老。它是在2001年編寫的，Python有更多的現代工具，如itertools.groupby（在Python2.4中引入，released in late 2003）。這裏是你的代碼可能看起來像什麼用groupby：

import itertools 
import sys 

with open('para-lines.txt', 'r') as f: 
    paranum = 0 
    for is_separator, paragraph in itertools.groupby(f, lambda line: line == '\n'): 
     if is_separator: 
      # we've reached paragraph separator 
      print 
     else: 
      paranum += 1 
      for n, sentence in enumerate(paragraph, start = 1): 
       sys.stdout.write(
        'Para#{i:d},Sent#{n:d}: {s}'.format(
         i = paranum, n = n, s = sentence))

來源

2011-12-27 16:25:12 unutbu

我是否有權說當我離開'for'循環時，'段落'會超出範圍？我如何保留段落並繼續在'itertools.groupby'循環之外訪問它？ – alvas 2011-12-27 16:48:25

不，名稱'段落'不會超出範圍。 Python並沒有爲''with''和'for'等塊結構打開新的範圍，只是爲了函數。 – kindall 2011-12-27 16:59:26

段落每次在循環中被重新分配一個新值。如果你希望保留舊的段落，你可以在循環外定義一個'paragraph = []'列表，並且在循環中追加每個段落：'paragraphs.append（paragraph）'。 – unutbu 2011-12-27 17:00:17

的問題似乎是，你遍歷你Paragraphs類，而不是字典。此外，而不是遍歷鍵，然後訪問該字典條目，可以考慮使用

for (key, value) in d.items():

來源

2011-12-27 16:02:52 Nicolas78

它的失敗，因爲你沒有在你的段落類中定義__iter__()，然後嘗試調用iter(doc)做（其中文檔是一個段落實例）。

要迭代一個類必須有__iter__()它返回迭代器。 Docs here。

來源

2011-12-27 16:04:14 soulcheck

你有行

for para in iter(doc):

的問題是，doc是段落的一個實例，而不是一個defaultdict。您在__init__方法中使用的默認字典超出了範圍並丟失。所以，你需要做兩兩件事：

保存在__init__方法，實例變量創建doc（self.doc，例如）。
要麼Paragraphs本身可迭代（通過添加__iter__方法），要麼允許它訪問創建的doc對象。

來源

2011-12-27 16:06:11

我試圖節省'self.doc = defaultdict（名單）'和'self.doc [paraIndex]的'doc'和'self.doc'。追加（線）'。但是同樣的超出範圍問題發生。 – alvas 2011-12-27 16:50:09

@ 2er0：它在範圍內，但是作爲'doc。doc'（這意味着還有一個命名問題 - 你應該在'print_doc'中使用'paragraph'而不是'doc'）。 – 2011-12-27 17:26:09

是的，謝謝你注意命名問題，在迭代過程中發生了一些小的改動之後。但讓我看看我能否將'self.doc'解決方案與unutbu的循環解決方案結合起來。 – alvas 2011-12-27 18:28:23

我想不出爲什麼你在這裏使用字典，更不用說defaultdict了。列表清單會簡單得多。

doc = [] 
with open(filename) as readFile: 
    para = [] 
    for line in readFile: 
     if line == separator: 
      doc.append(para) 
      para = [] 
     else: 
      para.append(line) 
    doc.append(para)

來源

2011-12-27 16:09:42

這是因爲我的txt文件將是一個很大的txtfile，所以通過嵌套列表訪問需要花費很多時間。也許我會需要一本字典詞典。如果我想要字典字典，我該怎麼辦？ – alvas 2011-12-27 16:51:56

這是怎麼回事？你爲什麼認爲一個嵌套列表需要的時間比一個字典的時間要長？ – 2011-12-27 18:08:52

如何迭代Python中的defaultdict（列表）？

回答

相關問題