需要幫助將數據導出到JSON文件

我只需進入編碼和Python編碼。目前我正在研究webcrawler。我需要將我的數據保存到JSON文件，以便將其導出到MongoDB中。需要幫助將數據導出到JSON文件

import requests 
import json 
from bs4 import BeautifulSoup 

url= ["http://www.alternate.nl/html/product/listing.html?filter_5=&filter_4=&filter_3=&filter_2=&filter_1=&size=500&lk=9435&tk=7&navId=11626#listingResult"] 

amd = requests.get(url[0]) 
soupamd = BeautifulSoup(amd.content) 

prodname = [] 
adinfo = [] 
formfactor = [] 
socket = [] 
grafisch = [] 
prijs = [] 

a_data = soupamd.find_all("div", {"class": "listRow"}) 
for item in a_data: 
    try: 
     prodname.insert(len(prodname),item.find_all("span", {"class": "name"})[0].text) 
     adinfo.insert(len(adinfo), item.find_all("span", {"class": "additional"})[0].text) 
     formfactor.insert(len(formfactor), item.find_all("span", {"class": "info"})[0].text) 
     grafisch.insert(len(grafisch), item.find_all("span", {"class": "info"})[1].text) 
     socket.insert(len(socket), item.find_all("span", {"class": "info"})[2].text) 
     prijs.insert(len(prijs), item.find_all("span", {"class": "price right right10"})[0].text) 
    except: 
     pass

我被困在這部分。我想將我在數組中保存的數據導出爲JSON文件。這是我現在有：

file = open("mobos.json", "w") 

for i = 0: 
    try: 
     output = {"productnaam": [prodname[i]], 
     "info" : [adinfo[i]], 
     "formfactor" : [formfactor[i]], 
     "grafisch" : [grafisch[i]], 
     "socket" : [socket[i]], 
     "prijs" : [prijs[i]]} 
     i + 1 
     json.dump(output, file) 
     if i == 500: 
      break 
    except: 
     pass 

file.close()

所以我想創建一個字典格式是這樣的：

{"productname" : [prodname[0]], "info" : [adinfo[0]], "formfactor" : [formfactor[0]] .......} 
{"productname" : [prodname[1]], "info" : [adinfo[1]], "formfactor" : [formfactor[1]] .......} 
{"productname" : [prodname[2]], "info" : [adinfo[2]], "formfactor" : [formfactor[2]] .......} etc.

來源

2014-11-24 henktenk

您可能想要閱讀關於循環再次和列表的Python教程。不要使用'listobject.insert（len（listobject），...）'，例如使用'listobject.append（..）'，爲什麼不把所有的信息添加到**一個**列表中（作爲字典，例如），然後只是循環超過一個列表？你可以在'listobject：'中使用'item並且不需要索引。 – 2014-11-24 10:31:55

你真的*不想使用'try ... except'而沒有特殊的例外;不要掩蓋你的錯誤。 – 2014-11-24 10:32:51

創建詞典，首先，在一個列表，然後保存一個清單一個JSON文件，你可以一個有效的JSON對象：

soupamd = BeautifulSoup(amd.content) 
products = [] 

for item in soupamd.select("div.listRow"): 
    prodname = item.find("span", class_="name") 
    adinfo = item.find("span", class_="additional") 
    formfactor, grafisch, socket = item.find_all("span", class_="info")[:3] 
    prijs = item.find("span", class_="price") 
    products.append({ 
     'prodname': prodname.text.strip(), 
     'adinfo': adinfo.text.strip(), 
     'formfactor': formfactor.text.strip(), 
     'grafisch': grafisch.text.strip(), 
     'socket': socket.text.strip(), 
     'prijs': prijs.text.strip(), 
    }) 

with open("mobos.json", "w") as outfile: 
    json.dump(products, outfile)

如果你真的要產生不同的JSON對象，鄰每行NE，寫之間，你至少可以再次找到這些對象換行符（解析將是另有一個獸）：

with open("mobos.json", "w") as outfile: 
    for product in products: 
     json.dump(products, outfile) 
     outfile.write('\n')

因爲我們現在有對象一個列表，遍歷該列表與for相比要簡單得多。

從你的代碼的一些其他方面的差異：

使用list.append()而非list.insert();當有任務的標準方法時，不需要這樣冗長的代碼。
如果您正在尋找只是一個比賽，用element.find()而不是element.find_all()
你真的想要避免使用blanket exception handling;你會掩蓋得遠遠超過你想要的。僅限於捕獲特定的例外。
我使用str.strip()刪除通常在HTML文檔中添加的額外空白;你也可以添加一個額外的' '.join(textvalue.split())來刪除內部換行符和壓縮空白，但是這個特定的網頁似乎並不需要這種措施。

來源

2014-11-24 10:42:15

感謝您的幫助！我的輸出中有一些unicode字符集。像這樣：\ u20ac。有沒有辦法刪除/替換？ – henktenk 2014-11-24 11:37:23

@henkownz：你所有的輸出都是Unicode的;你的意思是你有非ASCII字符。 :-)你有[U + 20AC EURO SIGN]（http://codepoints.net/U+20ac），正確地轉義爲JSON數據，你確定要擺脫那些？你總是可以使用顯式替換（'str.replace（）'）來移除它們，或者使用'str.translate（）'去除多個字符。或者，您可以使用['unidecode']（https://pypi.python.org/pypi/Unidecode）將任何非ASCII碼替換爲最接近的ASCII碼。 – 2014-11-24 11:57:35

需要幫助將數據導出到JSON文件

回答

相關問題