提取價值 - 蟒蛇的XPath

我試圖提取這樣構建的網頁的一部分：提取價值 - 蟒蛇的XPath

<div class="entry"> 
    <span>Title</span> 

    <h2>Title1</h2> 
    <p>Content1 details</p> 
    <ul> 
      <li>Content1 list</li> 
    </ul> 
    <p>More content1 details</p> 

    <h2>Title2</h2> 
    <p>Content2 details</p> 
    <p>More content2 details</p> 
    <p>More content2 details</p> 
</div>

我想TITLE1和標題2之間的所有標籤解壓到一個目錄。和title2之後的所有標籤到另一個列表。

是否有可能在xpath中使用某種正則表達式？我怎樣才能做到這一點？

來源

2015-11-03 Kuzgun

也許用'beautifulsoup'？ – Berci

檢查http://stackoverflow.com/questions/18207439/extracting-content-between-two-tags-with-xpath – eLRuLL

大概你正在尋找元素而不是標籤。就像，你對''結束標記不感興趣。 –

聯合preceding-sibling和following-sibling軸。從Scrapy Shell演示：

In [1]: for item in response.xpath("//*[preceding-sibling::h2 = 'Title1' and following-sibling::h2 = 'Title2']").extract(): 
    ...:  print(item) 
    ...:  
<p>Content1 details</p> 
<ul> 
     <li>Content1 list</li> 
</ul> 
<p>More content1 details</p>

來源

2015-11-03 22:12:50 alecxe

提取價值 - 蟒蛇的XPath

回答

相關問題