蟒蛇怎麼算的HTML

的開始和結束標記的數量如何計算在HTML蟒蛇怎麼算的HTML

ya.html

<div class="side-article txt-article"> 
<p> 
    <strong> 
    </strong> 
    <a href="http://batam.tribunnews.com/tag/polres/" title="Polres"> 
    </a> 
    <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan"> 
    </a> 
</p> 
<p> 
    <br> 
</p> 
<p> 
    <a href="http://batam.tribunnews.com/tag/polres/" title="Polres"> 
    </a> 
</p> 
<p> 
    <a href="http://batam.tribunnews.com/tag/polres/" title="Polres"> 
    </a> 
    <a href="http://batam.tribunnews.com/tag/bintan/" title="Bintan"> 
    </a> 
</p> 
<br>

我的代碼

from bs4 import BeautifulSoup 

soup = BeautifulSoup(open('ya.html'), "html.parser") 
num_apperances_of_tag = len(soup.find_all()) 

print num_apperances_of_tag

的開始和結束標記的數量

輸出

但這不是我想要的，因爲我的代碼計數爲<p> </p>，但我希望單獨計算開始和結束標記。

如何計算HTML中的開始和結束標記的數量？所以輸出將

感謝

來源

2016-11-11 Kim Hyesung

我建議你使用HTML解析器來解決這個問題：

from HTMLParser import HTMLParser 

number_of_starttags = 0 
number_of_endtags = 0 

# create a subclass and override the handler methods 
class MyHTMLParser(HTMLParser): 
    def handle_starttag(self, tag, attrs): 
     global number_of_starttags 
     number_of_starttags += 1 

    def handle_endtag(self, tag): 
     global number_of_endtags 
     number_of_endtags += 1 

# instantiate the parser and fed it some HTML 
parser = MyHTMLParser() 
parser.feed('<html><head><title>Test</title></head><body><h1>Parse me!</h1></body></html>') 

print(number_of_starttags, number_of_endtags)

來源

2016-11-11 15:01:33

它並沒有爲我工作，我得到UnboundLocalError：局部變量「number_of_starttags」引用在分配之前。 –

對，因爲班級。只需指出全局的變量，它會正常工作。 –

蟒蛇怎麼算的HTML

回答

相關問題