「預期的字符串或緩衝區」使用錯誤美麗的湯

我試圖將使用美麗的湯拉號從一個URL，然後總結這些數字的代碼，但我不斷收到類似如下的錯誤：「預期的字符串或緩衝區」使用錯誤美麗的湯

預期的字符串或緩衝區

我認爲這是關係到正則表達式，但我不能查明問題。

import re 
import urllib 

from BeautifulSoup import * 
htm1 = urllib.urlopen('https://pr4e.dr-chuck.com/tsugi/mod/python-data/data/comments_42.html').read() 
soup = BeautifulSoup(htm1) 
tags = soup('span') 

for tag in tags: 
    y = re.findall ('([0-9]+)',tag.txt) 

print sum(y)

來源

2015-11-26 Julia_arch

我建議bs4代替BeautifulSoup（這是舊版本）。您還需要改變這一行：

到

y = re.findall ('([0-9]+)',tag)

是這樣的：

y = re.findall ('([0-9]+)',tag.text)

看看這進一步讓你：

sum = 0 #initialize the sum 
for tag in tags: 
    y = re.findall ('([0-9]+)',tag.text) #get the text from the tag                                  
    print(y[0]) #y is a list, print the first element of the list                                  
    sum += int(y[0]) #convert it to an integer and add it to the sum                                 

print('the sum is: {}'.format(sum))

來源

2015-11-26 01:57:09 davejagoda

你說的沒錯，當我改變了行根據你的建議，我已經通過了錯誤，但代碼仍然無法按預期工作。我上週嘗試安裝bs4，但無法正確安裝，因此我決定堅持使用bs3。 –

請將上面的代碼更新到您當前的版本 - 我的測試看起來像你很近。 – davejagoda

當前版本是什麼意思？我使用的是Python 2.7和bs3 –

「預期的字符串或緩衝區」使用錯誤美麗的湯

回答

相關問題