2012-02-07 63 views
0

解析HTML我有這樣的HTML我從中提取數據:使用beautifulSoup與兒童和家長的div具有相同的類

<html> 
<head></head> 
<body> 
<div class="main"> 
    <div class="utlimate"><p>hello</p></div> 
    <div class = "headline"><p>some text</p></div> 
    <div class="content"> 
    <div class = "utimate"> <p>TOP</p> 
     <div class ="utlimate"> <p>data1</p></div> 
     <div class ="utlimate"> <p>it could be anything</p></div> 
     <div class ="utlimate"> <p>not</p></div> 
     <div class ="utlimate"> <p></p></div> 

    </div> 
    </div> 
</div> 
</body> 
</html> 

我需要訪問<div class="ultimate"><p>有值「數據1」,「它可以是任何東西」,‘不是’.The代碼我想這:

soup = BeautifulSoup(HTML_data)  #HTML_data is all html content 
first_div = soup.find('div',{"class" : "content"}) 
second_div = first_div.find('div',{"class" : "utlimate"}) 
div_list = second_div.findall('div',{"class" : "utlimate"}) 

我在我的代碼最後一行‘NoneType’對象有錯誤是不可呼叫

如何訪問只有那些div的??? plz幫助

回答

2

試試這個:

soup = BeautifulSoup(HTML_data)  #HTML_data is all html content 
first_div = soup.find('div',{"class" : "content"}) 
second_div = first_div.find('div',{"class" : "utimate"}) 
div_list = second_div.findAll('div',{"class" : "utlimate"}) 

用於獲取列表中的方法是findAll,不findall。 在HTML片段中沒有「終極」,它們是「極端」或「完美」的。這些錯別字嗎?

1

是湯沒有?

我建議你重新因子代碼,以防止這種情況:

soup = BeautifulSoup(HTML_data)  #HTML_data is all html content 
if soup ==None: 
    //Error 
else: 
    c = soup.contents 
    // Use RE here