2012-03-12 51 views
3

我有一個HTML文件,我想借此抓住從該塊文字,如下所示:美麗的湯 - 上課了一個HTML文件

<strong class="fullname js-action-profile-name">User Name</strong> 
    <span>&rlm;</span> 
    <span class="username js-action-profile-name"><s>@</s><b>UserName</b></span> 

我希望它顯示爲:

User Name 
@UserName 

我如何使用美麗的湯做這個?

回答

1

使用「text」屬性。例如:

>>> b = BeautifulSoup.BeautifulStoneSoup(open('/tmp/x.html'), convertEntities=BeautifulSoup.BeautifulStoneSoup.HTML_ENTITIES) 

>>> print b.find(attrs={"id": "container"}).text 
User Name‏@UserName 

在x.html中我有一個包含您提供的html的div,其ID爲「container」。請注意,我使用BeautifulStoneSoup將‏轉換爲\ u200f。要插入一個換行符(不會被瀏覽器引入),只需用'\ n'替換u'\ u200f'即可。

0

這是假設的index.html包含問題標記:

import BeautifulSoup 

def displayUserInfo(): 

    soup = BeautifulSoup.BeautifulSoup(open("index.html")) 
    fullname_ele = soup.find(attrs={"class": "fullname js-action-profile-name"}) 
    fullname = fullname_ele.contents[0] 
    print fullname 

    username_ele = soup.find(attrs={"class": "username js-action-profile-name"}) 
    username = "" 
    for child in username_ele.findChildren(): 
     username += child.contents[0] 
    print username 

if __name__ == '__main__': 
    displayUserInfo() 

# prints: 
# User Name 
# @UserName 
1
from bs4 import BeautifulSoup 

html = '''<strong class="fullname js-action-profile-name">User Name</strong> 
    <span>&rlm;</span> 
    <span class="username js-action-profile-name"><s>@</s><b>UserName</b></span>''' 

soup = BeautifulSoup(html) 

username = soup.find(attrs={'class':'username js-action-profile-name'}).text 
fullname = soup.find(attrs={'class':'fullname js-action-profile-name'}).text 

print fullname 
print username 

輸出:

User Name 
@UserName 

有兩點需要注意:

  1. 使用bs4如果你開始新的東西/只是學習BS。

  2. 您可能會從外部文件加載您的HTML,因此用文件對象替換html