我正在使用BeautifulSoup4並請求從網站上刮取信息。Python忽略字符並從循環列表中打印下一個字符
然後,我將所需的信息存儲在列表中,有兩個列表,分別列出了從頁面抓取的兩種不同類型的信息。
try:
for i in range(0,1000):
location = dive_data1[((9*i)-7)].text
locations.append(location)
location = dive_data2[((9*i)-7)]
locations.append(location)
depth = dive_data1[((9*i)-6)].text
depths.append(depth)
depth = dive_data2[((9*i)-6)].text
depths.append(depth)
except:
pass
之後,我嘗試將這些列表傳遞給另一個循環以將內容寫入CSV文件。
try:
writer = csv.writer(dive_log)
writer.writerow(("Locations and depths"))
writer.writerow(("Sourced from:", str(url_page)))
writer.writerow(("Location", "Depth"))
for i in range(len(locations)):
writer.writerow((locations[i], depths[i]))
當我運行腳本我收到此錯誤:
writer.writerow((locations[i], depths[i]))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 65-66: ordinal not in range(128)
我嘗試這樣通過字符它不能編碼:
writer = csv.writer(dive_log)
writer.writerow(("Locations and depths"))
writer.writerow(("Sourced from:", str(url_page)))
writer.writerow(("Location", "Depth"))
for i in range(len(locations)):
try:
writer.writerow((locations[i], depths[i]))
except:
pass
當運行此,只有線之前到for循環被執行,它完全通過for循環的重複。
我的腳本中的代碼的全部內容複製到下面,以防它與我在其餘部分中沒有看到的內容有關。
import csv
from bs4 import BeautifulSoup
import requests
dive_log = open("divelog.csv", "wt")
url_page = "https://en.divelogs.de/log/Mark_Gosling"
r = requests.get(url_page)
soup = BeautifulSoup(r.content)
dive_data1 = soup.find_all("tr", {"class": "td2"})
dive_data2 = soup.find_all("td", {"class": "td"})
locations = []
depths = []
try:
for i in range(0,1000):
location = dive_data1[((9*i)-7)].text
locations.append(location)
location = dive_data2[((9*i)-7)]
locations.append(location)
depth = dive_data1[((9*i)-6)].text
depths.append(depth)
depth = dive_data2[((9*i)-6)].text
depths.append(depth)
except:
pass
try:
writer = csv.writer(dive_log)
writer.writerow(("Locations and depths"))
writer.writerow(("Sourced from:", str(url_page)))
writer.writerow(("Location", "Depth"))
for i in range(len(locations)):
try:
writer.writerow((locations[i], depths[i]))
except:
pass
finally:
dive_log.close()
print open("divelog.csv", "rt").read()
print "\n\n"
print locations
這應該跟charac它不能編碼:'湯= BeautifulSoup(response.content.decode('utf-8','忽略'))' – yedpodtrzitko
不要忽視任何東西,除非你可以丟失數據,找出正確的編碼然後使用那。數據也是UTF-8編碼,所以問題在於別處。也不要使用毯子除外,抓住你的期望和記錄/打印錯誤。 –