0
我正在學習Python的美麗湯,並試圖解析一個網站「https://www.twitteraudit.com/」。當我在搜索欄中輸入推特ID時,它會在幾秒鐘內返回某個ID的結果,但某些ID需要大約一分鐘來處理數據。在這種情況下,如何在加載HTML或結果完成後解析HTML?我試圖循環它,但它不這樣工作。但我想到的是,如果我打開一個瀏覽器並加載網頁鏈接,一旦它完成它將緩存存儲在計算機中,並且下一次當我運行相同的ID時,它完美地工作。Python美麗的湯
任何人都可以幫我解決這個問題嗎?我很感激幫助。我附上下面的代碼>>
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
import re
from re import sub
def HTML(myURL):
uClient = uReq(myURL)
pageHTML = uClient.read()
uClient.close()
pageSoup = soup(pageHTML, "html.parser")
return pageSoup
def fakecheck(usr):
myURLfc = "https://www.twitteraudit.com/" + usr
pgSoup = HTML(myURLfc)
foll = pgSoup.findAll("div",{"class":"audit"})
link = foll[0].div.a["href"]
real = foll[0].findAll("span",{"class":"real number"})[0]["data-value"]
fake = foll[0].findAll("span",{"class":"fake number"})[0]["data-value"]
scr = foll[0].findAll("div",{"class":"score"})[0].div
scoresent = scr["class"][1]
score = re.findall(r'\d{1,3}',str(scr))[0]
return [link, real, fake, scoresent, score]
lis = ["BarackObama","POTUS44","ObamaWhiteHouse","MichelleObama","ObamaFoundation","NSC44","ObamaNews","WhiteHouseCEQ44","IsThatBarrak","obama_barrak","theprezident","barrakubama","BarrakObama","banackkobama","YusssufferObama","barrakisdabomb_","BarrakObmma","fuzzyjellymasta","BarrakObama6","bannalover101","therealbarrak","ObamaBarrak666","barrak_obama"]
for u in lis:
link, real, fake, scoresent, score = fakecheck(u)
print ("link : " + link)
print ("Real : " + real)
print ("Fake : " + fake)
print ("Result : " + scoresent)
print ("Score : " + score)
print ("=================")
做一些數據沒有收到?我運行了你的代碼,得到了所有23個查詢的結果,它似乎工作正常。 – davedwards
謝謝你的回覆......只需用這些值更改lis值就可以瞭解情況..... lis = [「TomCruise」,「TomCruiseFanCom」,「TomCruiseBRCom」,「TheAmyNicholson」,「TomCruiseIndo」,「 MissionFilm「,」JackReacher「,」Not_TomCruise「,」Pompey_Dave「,」tomcruiseblog「,」cubanalaf「,」JustinMeliNY「,」rivergyllenhaal「,」eddiehamilton「,」TomCruiseActor「] –
我明白了,當網站報告沒有結果時退出。如果你把'for'-loop放在''while'True:'-loop中,它會暫停在沒有任何結果的'ID'上,同時,我去請求頁面的審計對於'ID',並且當結果可用時,腳本繼續。這會解決它嗎? – davedwards