Python的BS4打印寫入錯誤

我試圖寫一個代碼來使用Python3網站抓住一些數據，你可以從代碼中看到：Python的BS4打印寫入錯誤

from bs4 import BeautifulSoup 
import urllib.request 
import sys 
headers={} 
headers['User-Agent']="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36" 
req=urllib.request.Request('http://www.cjcyw.com/a/chuanbodangan/2015/0930/47853.html',headers=headers) 
resp=urllib.request.urlopen(req) 
xml=BeautifulSoup(resp,'html.parser') 
x=xml.findAll('dd') 
for item in x: 
    item=item.text.encode('utf-8') 
    print(sys.stdout.buffer.write(item))

的結果是這樣的：

result1

當我把這些數據寫入到一個txt文件：

我使用STR調試，真正的問題是蹦出：

buggggggg

來源

2015-10-21 dongjian xiao

在4.txt文件中顯示數字，但不是我想要的結果。 –

你爲什麼使用'sys.stdout.buffer.write'？嘗試'f.write（item）'。 –

我不認爲這裏需要'.encode（）'。 –

您可以在這裏使用.strings。 strings

from bs4 import BeautifulSoup 
import urllib.request 
import sys 
headers={} 
headers['User-Agent']="Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.85 Safari/537.36"   req=urllib.request.Request('http://www.cjcyw.com/a/chuanbodangan/2015/0930/47853.html',headers=headers) 
resp=urllib.request.urlopen(req) 
xml=BeautifulSoup(resp,'html.parser') 
x=xml.findAll('dd') 

file = open("4.txt", 'a') 
for item in x: 
    s = "" 
    for string in item.strings: 
     s += string 
    s += "\n" 
    file.write(s) 
file.close()

所有代碼都被粘貼。

來源

2015-10-21 08:43:07 uoryon

不工作，但thx，我想也許先運行代碼，然後可能更有幫助 –

我已經運行了代碼，我會在這裏粘貼我的整個代碼。運行它並獲得正確的文本文件 – uoryon

@dongjianxiao我在我的Mac上運行這段代碼。 – uoryon

首先，正如我所說的，在這裏不要使用sys.stdout.buffer.write，只需使用f.write(str(item))來代替。

然後，因爲Microsoft Windows中文版的默認文件編碼是GBK。文本的編碼看起來像是UTF-8。因此，你需要打開該文件在UTF-8編碼像這樣：

open('4.txt', 'a', encoding="utf-8")

並嘗試運行代碼。

來源

2015-10-21 09:22:41

Python的BS4打印寫入錯誤

回答

相關問題