python 3，BeautifulSoup 4，刮和打印特定分析樹的文本

我已經在這裏搜索，我還沒有找到一篇文章，可以幫助我完成我需要的東西。python 3，BeautifulSoup 4，刮和打印特定分析樹的文本

網站：http://www.animefansftw.com/

我試圖從只有一套最新獲得的所有帖子的標題H1！我能夠獲得設定日期的實際帖子，但卻一直停留在如何獲得帖子的h1標題上。

import time 
import requests 
import re 
from bs4 import BeautifulSoup 

Aniday = time.strftime("%B %d") 
r = requests.get("http://www.animefansftw.com") 
r.content 
soup = BeautifulSoup(r.content, "html.parser") 
print("Today's Animu Crack:\n") 

for div in soup.find_all("div", {"class": "date"}): 
    get_date = div.text 
    clean_date = " ".join(get_date.split()) 
    if clean_date == Aniday: 
     print(clean_date)

現在，爲了避免混淆，我可以得到該職位的H1標題名得很好，但我不希望所有的人只是那些包含我設定日期。

for item in soup.find_all("h1"): 
    info = item.text 
    clean_info = " ".join(info.split()) 
    print(clean_info)

來源

2015-08-14 Yami

是否有可能在div的帖子裏面的頁面上的日期元素。如果有，你可以使用一些條件流來循環，只打印正確的相應日期元素。如果帖子中沒有日期元素，那麼我不知道如何完成此操作，除非在美麗的湯分析器可以提取的html中存在某種時間戳。 –

從源頭上看，它看起來像是父母的父母包含h1標記。

嘗試：

import time 
import requests 
import re 
from bs4 import BeautifulSoup 

Aniday = time.strftime("%B %d") 
r = requests.get("http://www.animefansftw.com") 
r.content 
soup = BeautifulSoup(r.content, "html.parser") 
print("Today's Animu Crack:\n") 

for div in soup.find_all("div", {"class": "date"}): 
    get_date = div.text 
    clean_date = " ".join(get_date.split()) 
    if clean_date == Aniday: 
     post_div = div.parent.parent 
     title = post_div.h1.text.encode('ascii','ignore') 
     print("{title}\n{date}\n".format(title=title,date=clean_date))

來源

2015-08-14 23:24:35 clockwatcher

感謝您的幫助和時間。我還是新來的python，並想知道我可以閱讀父母。 – Yami

在美麗的湯文檔中介紹瀏覽文檔在「正在運行」部分中提到父級：http://www.crummy.com/software/BeautifulSoup/bs4/doc/#going-up – clockwatcher

謝謝，你一直是一個很好的幫助，我已經學會了如此多的設法獲得帖子的網址:)以及一些其他改進 http://pastebin.com/vraUki4u – Yami

python 3，BeautifulSoup 4，刮和打印特定分析樹的文本

回答

相關問題