如何迭代Python中的defaultdict(list)? 是否有更好的方式在Python中擁有一個列表字典? 我試過正常iter(dict)
,但我得到了錯誤:如何迭代Python中的defaultdict(列表)?
>>> import para
>>> para.print_doc('./sentseg_en/essentials.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "para.py", line 31, in print_doc
for para in iter(doc):
TypeError: iteration over non-sequence
主類:
import para
para.print_doc('./foo/bar/para-lines.txt')
的para.pyc:
# -*- coding: utf-8 -*-
## Modified paragraph into a defaultdict(list) structure
## Original code from http://code.activestate.com/recipes/66063/
from collections import defaultdict
class Paragraphs:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
# Separator here refers to the paragraph seperator,
# the default separator is '\n'.
def __init__(self, filename, separator=None):
# Set separator if passed into object's parameter,
# else set default separator as '\n'
if separator is None:
def separator(line): return line == '\n'
elif not callable(separator):
raise TypeError, "separator argument must be callable"
self.separator = separator
# Reading lines from files into a dictionary of lists
self.doc = defaultdict(list)
paraIndex = 0
with open(filename) as readFile:
for line in readFile:
if line == separator:
paraIndex+=1
else:
self.doc[paraIndex].append(line)
# Prints out populated doc from txtfile
def print_doc(filename):
text = Paragraphs(filename)
for para in iter(text.doc):
for sent in text.doc[para]:
print "Para#%d, Sent#%d: %s" % (
para, text.doc[para].index(sent), sent)
的如的./foo/bar/para-lines.txt
看起來像這樣:
This is a start of a paragraph.
foo barr
bar foo
foo foo
This is the end.
This is the start of next para.
foo boo bar bar
this is the end.
主類的輸出應該是這樣的:
Para#1,Sent#1: This is a start of a paragraph.
Para#1,Sent#2: foo barr
Para#1,Sent#3: bar foo
Para#1,Sent#4: foo foo
Para#1,Sent#5: This is the end.
Para#2,Sent#1: This is the start of next para.
Para#2,Sent#2: foo boo bar bar
Para#2,Sent#3: this is the end.
我是否有權說當我離開'for'循環時,'段落'會超出範圍?我如何保留段落並繼續在'itertools.groupby'循環之外訪問它? – alvas 2011-12-27 16:48:25
不,名稱'段落'不會超出範圍。 Python並沒有爲''with''和'for'等塊結構打開新的範圍,只是爲了函數。 – kindall 2011-12-27 16:59:26
段落每次在循環中被重新分配一個新值。如果你希望保留舊的段落,你可以在循環外定義一個'paragraph = []'列表,並且在循環中追加每個段落:'paragraphs.append(paragraph)'。 – unutbu 2011-12-27 17:00:17