如何擺脫文本上方的空白，使用bs4

-1

好的，所以我使用bs4（BeautifulSoup）解析通過網站，並找到我正在尋找的具體標題。我的代碼如下所示：如何擺脫文本上方的空白，使用bs4

import requests 
from bs4 import BeautifulSoup 
url = 'http://www.ewn.co.za/Categories/Local' 
r = requests.get(url).text 
soup = BeautifulSoup(r) 
for i in soup.find_all(class_='article-short'): 
    if i.a: 
     print(i.a.text.replace('\n', '').strip()) 
    else: 
     print(i.contents[0].strip())

此代碼的工作，但在其輸出節目，如20線空白的第一，從網站上打印申請標題前。我的代碼有什麼問題，或者有什麼我可以做的，以擺脫空白？

來源

2016-05-14 raid3r

隨着帶的功能，你可以在一個字符串中刪除空格（https://docs.python.org/3/library/stdtypes.html#str.strip） – Querenker

因爲你有這樣的內容：

<article class="article-short"> 
<div class="thumb"><a href="http://ewn.co.za/2016/05/14/Contralesa-against-scrapping-initiation-due-to-cold-weather"><img alt="FILE: Boys who have undergone a circumcision ceremony walk near Qunu in the Eastern Cape in 2013. Picture: AFP." height="147" src="http://ewn.co.za/cdn/-%2fmedia%2f3C37CB28056746CD95FC913757AAD41C.ashx%3fas%3d1%26h%3d147%26w%3d234%26crop%3d1;waeb9b8157b3e310df" width="234"/></a></div> 
<h6 class="h6-mega"><a href="http://ewn.co.za/2016/05/14/Contralesa-against-scrapping-initiation-due-to-cold-weather">Contralesa against scrapping initiation due to cold weather</a></h6> 
</article>

其中第一個鏈接包含圖像，並沒有文字。

您應該尋找代替h6標記。所以，像這樣的工作：

import requests 
from bs4 import BeautifulSoup 
url = 'http://www.ewn.co.za/Categories/Local' 
r = requests.get(url).text 
soup = BeautifulSoup(r) 
for i in soup.find_all(class_='article-short'): 
    title = (i.h6.text.replace('\n', '') if i.h6 else contents[0]).strip() 
    if title: 
     print(title)

來源

2016-05-14 13:44:05 aldanor

謝謝！ @aldanor現在效果更好！ – raid3r

如何擺脫文本上方的空白，使用bs4

回答

相關問題