Python I/O：混合數據類型

我正在寫一個小腳本，它將一個目錄中的大量JSON文件合併到一個文件中。麻煩的是，我不完全確定我的數據處於何種狀態。類型錯誤比比皆是。這是腳本;Python I/O：混合數據類型

import glob 
import json 
import codecs 

reader = codecs.getreader("utf-8") 

for file in glob.glob("/Users/me/Scripts/BagOfJson/*.json"): 
#Aha, as binary here 
with open(file, "rb") as infile: 
    data = json.load(reader(infile)) 
    #If I print(data) here, looks like good ol' JSON 

    with open("test.json", "wb") as outfile: 
     json.dump(data, outfile, sort_keys = True, indent = 2, ensure_ascii = False) 
    #Crash

此腳本導致以下錯誤;

TypeError: a bytes-like object is required, not 'str'

這是由json.dump行引起的。

天真的我只是刪除'wb'中的'b'outfile打開。這並不能解決問題。

也許這是我使用shell進行測試以及使用type（）python函數的教訓。不過，如果有人能夠爲我清除這些數據交換背後的邏輯，我很樂意。我希望它可以都是字符串...

來源

2016-08-18 Typhon

當您移除「b」時發生了什麼？也許你得到了一個*不同的錯誤？ –

此外，這是Python 2還是Python 3？ –

@MartijnPieters好吧，Martijn，我會告訴你當我在'wb'中刪除'b'時會發生什麼。有用。當我嘗試這個時，我一定有另一個錯誤。謝謝你的明智問題！這是python 3 – Typhon

如果這是Python 3，刪除b（二進制模式）打開文件在文本模式應該工作得很好。您可能要明確指定編碼：

with open("test.json", "w", encoding='utf8') as outfile: 
    json.dump(data, outfile, sort_keys = True, indent = 2, ensure_ascii = False)

而不是依賴於默認值。

你不應該真的使用codecs.getreader()。標準的open()函數可以很好地處理UTF-8文件;只是再次打開在文本模式下的文件，並指定編碼：

import glob 
import json 

for file in glob.glob("/Users/me/Scripts/BagOfJson/*.json"): 
    with open(file, "r", encoding='utf8') as infile: 
     data = json.load(infile) 
     with open("test.json", "w", encoding='utf8') as outfile: 
      json.dump(data, outfile, sort_keys = True, indent = 2, ensure_ascii = False)

以上仍然會重新創建test.json在*.json水珠每個文件;您無法將多個JSON文檔放在同一個文件中（除非您專門創建JSONLines files，因爲您使用的是indent，所以您不在這裏執行）。

如果要重新格式化glob中的所有JSON文件，則需要寫入新文件名並將新名稱移回file文件名。

來源

2016-08-18 14:20:12

Python I/O：混合數據類型

回答

相關問題