美麗的湯忽略內部HTML

我有以下的HTML，在這裏我只想拿到產品名稱而忽略html.How的其餘部分，我可以做到這一點美麗的湯忽略內部HTML

我使用beautifulsoup Apple iPhone 4 Verizon

希望以此爲oputpout

<h1 itemprop="itemreviewed">Apple iPhone 4 Verizon  
         <div class="right"> 
    <span class="s_button_follow_special" style="display: block"> 
    <a href="javascript:;" style="display: block" onclick="subscribe(this, 1, 5132);" class="follow_1_5132 s_button_2 s_button_follow" title="Follow Apple iPhone 4 Verizon"><em class="s_icon s_icon_follow"></em>Follow</a> 
    <a class="s_button_2 s_button_follow_arrow" href="javascript:;" onclick="subscribe(this, 1, 5132, '', 2);"></a> 
    </span> 
    <a href="javascript:;" style="display: none" onclick="subscribe(this, 1, 5132);" class="unfollow_1_5132 s_button_2 s_button_follow_disabled s_button_following" title="Unfollow Apple iPhone 4 Verizon"><span><em class="s_icon s_icon_following"></em>Following</span></a> 
    </div> 
    </h1> 


    header= soup('h1', {'itemprop' : 'itemreviewed'})

來源

2012-07-31 Rajeev

我的例子 – Rajeev 2012-07-31 13:57:02

的Apple iPhone 4 Verizon文本解析樹自己的元素，從任何其他獨立;您可以通過獲取附近的元素並使用nextSibling,previousSibling,next或previous進行導航來選擇它。

所以這應該工作：

header = soup.find('h1', itemprop='itemreviewed') 
text = header.next

來源

2012-07-31 13:57:51

像

 
soup = BeautifulSoup(<h1 ....) 
header = soup.h1['itemprop'].contents

來源

2012-07-31 13:50:24 Alexander

年底給我想'.contents'將所有標籤的內容，包括所有的HTML的獲取，如DIV等等。你可以嘗試使用'.contents [0]'來獲得第一個元素。 – 2012-07-31 13:58:18

你是對的，內容返回一個列表。 – Alexander 2012-08-01 06:31:11

美麗的湯忽略內部HTML

回答

相關問題