的Xpath僅獲得第一個HTML標記之後文本

有下一塊是的Xpath僅獲得第一個HTML標記之後文本

<div class="text"> 
 
    <h1>Headerh1</h1> 
 
    Text1 <br/> after header1 
 
    <h3>Headerh3.1</h3> 
 
    Text2 <br/> after header3.1 
 
    <h3>Headerh3.2</h3> 
 
    Text3 <br/> after header3.2 
 
    <h3>Headerh3.3</h3> 
 
    Text4 <br/> after header3.3 
 
</div>

如何使用後 //div[@class='text']/text()[count(preceding-sibling::h1)=1]迴歸文本「頭1後text1」中忽略<br/><br/>作爲第一H1後得到的文本所有標題。 <br>可以0+倍

來源

2017-08-05 dMazay

嘗試使用下面的XPath第一h3前右應該返回位於的div所有文本節點：

//div[@class='text']/h3[1]/preceding-sibling::text()

來源

2017-08-05 17:42:10 Andersson

它的工作原理，但是如果在div之後是文本，它也會被返回。是否可以在h3之間但在h1之後添加條件文本？ – dMazay

是的。這一個應該做的技巧'/ div [@ class ='text']/h3 [1]/preceding-sibling :: text（）[./ preceding-sibling :: h1]' – Andersson

已解決。有用！ – dMazay

我假定這是您的目錄中的HTML，這就是所謂 demo.html

from bs4 import BeautifulSoup 

with open("demo.html") as f: 
    data = f.read() 
    soup = BeautifulSoup(data, 'html.parser') 
    f.close() 

#to get the text after h1 tag 
h1 = soup.find('h1').text 
#to get the text after all h3 tags 
h3 = [i.text for i in soup.findAll('h3')]

輸出將是Unicode格式爲如：

h3 = [u'Headerh3.1', u'Headerh3.2', u'Headerh3.3']

將它們轉換成普通字符串爲此

h3 = [i.text.encode('utf-8') for i in soup.findAll('h3')] 
h1 = soup.find('h1').text.encode('utf-8')

來源

2017-08-05 17:16:44

我需要得到頭H1之間的文本，H3「文本1
頭1後」 – dMazay

的Xpath僅獲得第一個HTML標記之後文本

回答

相關問題