2016-04-26 63 views
0

我用蟒蛇feedparser從搗碎的飼料得到這個項目[「說明」]:如何僅從RSS提要項獲取描述的有用部分?

<img alt="9f4397d9c05e474fa54291507ad9c03a" src="http://rack.2.mshcdn.com/media/ZgkyMDE2LzA0LzI2LzM0LzlmNDM5N2Q5YzA1LjMzODI0LmpwZwpwCXRodW1iCTU3NXgzMjMjCmUJanBn/393b8db2/53c/9f4397d9c05e474fa54291507ad9c03a.jpg" /> 
<div style="float: right; width: 50px;"><a href="http://twitter.com/share?via=Mashable&amp;text=Nail+polish+stockings+are+exactly+what+you+need+for+a+lazy+summer+pedicure&amp;src=http%3A%2F%2Fmashable.com%2F2016%2F04%2F26%2Ftoe-nail-polish-stockings%2F" style="margin: 10px;"><img alt="Feed-tw" border="0" src="http://rack.1.mshcdn.com/assets/feed-tw-f7c0a094d16b7ee7c91a1e50839a8e00.jpg" /></a><a href="http://www.facebook.com/sharer.php?u=http%3A%2F%2Fmashable.com%2F2016%2F04%2F26%2Ftoe-nail-polish-stockings%2F&amp;src=sp" style="margin: 10px;"><img alt="Feed-fb" border="0" src="http://rack.1.mshcdn.com/assets/feed-fb-c0a21e8841794479b8086c32c6f24ba1.jpg" /></a></div> 
<div> 
    <p>Say goodbye messy pedicures and hello to finally feeling the sweet freedom of open toed shoes in summer.</p> 
    <p>Japanese fashion company <a href="http://www.bellemaison.jp/cpg/fashion/fakenail/fakenail_index.html">Belle Maison</a> has a time saving solution for those of us out there who have little time and little hand coordination for painting our toenails &#8212; thin stockings with pre-painted toenails.</p> 
    <div> 
     <p>SEE ALSO: <a href="http://mashable.com/2016/02/23/weiner-dog-ear-plugs/">Weiner dog ear plugs will help you sleep deeper than a newborn pup</a></p> 
    </div> 
    <figure> 
     <p><img class="" src="http://rack.1.mshcdn.com/media/ZgkyMDE2LzA0LzI2L2M1L3RvZW5haWxhcnRwLjI4NjBiLmpwZwpwCXRodW1iCTU3NXg0MDk2Pg/4f07495a/b32/toe-nail-art-polish-stockings-japan-10.jpg" /></p> 
     <div> 
      <p>Image: belle maison</p> 
     </div> 
    </figure> 
    <p>If you're worried about looking a little out-of-date with the classic stockings and open-toed heels that your grandma used to wear, don't fret. The stockings are designed to fit individual toes, giving your pedicure a better fit as well. <a href="http://mashable.com/2016/04/26/toe-nail-polish-stockings/">Read more...</a></p> 
</div> 
More about <a href="http://mashable.com/conversations/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&amp;utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Conversations</a>, <a href="http://mashable.com/pics/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&amp;utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Pics</a>, <a href="http://mashable.com/category/products/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&amp;utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Products</a>, <a href="http://mashable.com/lifestyle/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&amp;utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Lifestyle</a>, and <a href="http://mashable.com/category/weird-products/?utm_campaign=Mash-Prod-RSS-Feedburner-All-Partial&amp;utm_cid=Mash-Prod-RSS-Feedburner-All-Partial">Weird Products</a> 

這是一個可怕的很多信息。我真正需要讀者的部分是這樣的:

<p>Say goodbye messy pedicures and hello to finally feeling the sweet freedom of open toed shoes in summer.</p> 
<p>Japanese fashion company <a href="http://www.bellemaison.jp/cpg/fashion/fakenail/fakenail_index.html">Belle Maison</a> has a time saving solution for those of us out there who have little time and little hand coordination for painting our toenails &#8212; thin stockings with pre-painted toenails.</p> 

我該如何得到這部分?我應該只是去蟒蛇正則表達式?我不太確定,因爲幾乎所有的描述都是不同的,所以爲此寫一個表達式會很困難。有沒有另一個RSS項目元素只提供我想要的信息?謝謝!

回答

1

正如你猜對的那樣,正則表達式不能完成這個任務(強制鏈接到this question)。 所以,你最好的選擇是將你的HTML提供給像Beautifulsoup這樣的解析器,然後爲解析的DOM對象編寫你的邏輯。

from bs4 import BeautifulSoup 
soup = BeautifulSoup(my_input_html_string) 
my_elements = soup.find_all('p')[0:2] 

顯然,這個代碼假定你總是希望在任何給定的DOM前兩個<p>的你給它。您必須根據通過查看輸入提供的不同描述發現的一致性來調整邏輯。

1

如果你想要去的re方式,你可以做以下

pat = re.compile(r"<div>(.*?)</div>") 
s = pat.search(html).group(1) 
result = [line.strip() for line in s.strip().splitlines()[:2]] 
# result 
['<p>Say goodbye messy pedicures and hello to finally feeling the sweet freedom of open toed shoes in summer.</p>', 
'<p>Japanese fashion company <a href="http://www.bellemaison.jp/cpg/fashion/fakenail/fakenail_index.html">Belle Maison</a> has a time saving solution for those of us out there who have little time and little hand coordination for painting our toenails &#8212; thin stockings with pre-painted toenails.</p>'] 

但你可以看到,它的骯髒和容易斷裂。所以一個解決方案是編寫一個語法和一個小解析器。但強大和方便的方法是使用分析器,如Beautifulsouplxml