Unicode寫入正確，讀取不正確

我有一個.TSV文件的電影名稱和電影數據，我正在使用PYDOT軟件包進行分析。該文件已鏈接Here。包含用於創建它的JSON的文件鏈接到Here。Unicode寫入正確，讀取不正確

該文件是從解析的JSON寫入的，並且使用utf-8編碼編寫。雖然文件中寫入正確的，當我到Python讀回，解釋似乎始終停留在以下行：

'Taken\t["Liam Neeson", " Maggie Grace", " Jon Gries", " David Warshofsky"]\n' 
'The Walking Dead\t["Andrew Lincoln", " Steven Yeun", " Chandler Riggs",'

輸出應該是這樣的，並在文件中被寫成這樣：

Taken ["Liam Neeson", " Maggie Grace", " Jon Gries", " David Warshofsky"] 
The Walking Dead ["Andrew Lincoln", " Steven Yeun", " Chandler Riggs", " Norman Reedus"] 
Toy Story 3 ["Tom Hanks", " Tim Allen", " Joan Cusack", " Ned Beatty"]

這裏是用於創建文本文件代碼：

step3v2=open('step3.txt', 'rU') 
step4=codecs.open('step4.txt', mode='w', encoding='utf-8') 
data=[] 
merged='' 
for line in step3v2: 
    data.append(json.loads(line)) 

for row in data: 
    moviename=row[u'Title'] 
    row[u'Actors']=row[u'Actors'].split(',') 
    actors=json.dumps(row[u'Actors']) + '\r\n' 
    merged+=moviename + '\t' 
    merged+=actors 
step4.write(merged)

這裏是讀取文件的代碼：

graph=pydot.Dot(graph_type='graph', charset='utf8') 
step4v2=open('step4.txt', 'rU') 


textfile=step4v2.readlines() 
for line in textfile: 
    print repr(line)

來源

2014-02-11 Mike

解釋似乎在下面給arbritarily停止線：意味着什麼？有沒有錯誤？或者它只是等待或？ –

沒有錯誤。解釋者根本沒有閱讀的字符串更多。爲了更加清晰，我將編輯該問題。 – Mike

有時候意味着什麼？或總是？ –

step4v2=open('step4.txt', 'rU') #this means universal newlines

也許應該

step4v2=open('step3.txt', 'rb') #this means read the binary data

使用上的Dropbox文件您鏈接

>>> f =open (os.path.expanduser("~\\Downloads\\step4.txt"),"rb") 
>>> for line in f: print repr(line)

工作得很好，似乎

來源

2014-02-11 03:29:55

發生同樣的問題。 – Mike

嗯確定這很奇怪... –

錯誤，也許我告訴過你錯了一個改變... –

Unicode寫入正確，讀取不正確

回答

相關問題