如何使用Scrapy的css或xpath選擇器來選擇ul的特定li子文本？

-1

下面是HTML，我的工作：如何使用Scrapy的css或xpath選擇器來選擇ul的特定li子文本？

<div class="grdcpnsmllnks"> 
    <ul> 
     <li><i class="fa fa-check-square"></i>Verified Offer</li> 
     <li><i class="fa fa-eye"></i><label id="ltveri276270">Offer used 1 hour ago</label></li> 
     <li><i class="fa fa-clock-o"></i>Valid till 31/12/2016</li> 
    </ul> 
</div>

下面是我的代碼片段：

def parse_item(self, response): 
    endDate = response.xpath('//div[@class='grdcpnsmllnks']/ul/li/i[@class='fa-clock-o']::dd[1]/text()').extract() 
     yield { 
      'endDate': endDate 
     }

我要選擇的文本有效期至31/12/2016。我在首先選擇所需的<li>標籤時遇到問題，然後進一步選擇不包含在任何標籤內的文本。請建議我如何使用xpath或css選擇器來做到這一點。

來源

2016-12-14 Aman Agarwal

我想通過檢查其包含fa-clock-o類屬性的子i元素的存在找到li元素，然後獲取直接子text()節點，然後用.re_first()方法提取日期：

In [1]: response.xpath("//div[@class='grdcpnsmllnks']//li[i[contains(@class, 'fa-clock-o')]]/text()").re_first(r"Valid till\s+(\d+/\d+/\d+)") 
Out[1]: u'31/12/2016'

來源

2016-12-14 14:02:20 alecxe

當我在.extract（）中使用上面的命令時，它給出了一個屬性錯誤：'unicode'對象沒有屬性'extract（）' –

@AmanAgarwal它已經被're_first（）提取'不需要調用額外'提取（）'。 – alecxe

但即便如此，它並沒有給予價值，而是給予了「無」。 –

如何使用Scrapy的css或xpath選擇器來選擇ul的特定li子文本？

回答

相關問題