無法提取與匹配的類或ID的所有跨度

這可能是一些愚蠢的東西。但我想寫一個簡單的刮板來抓取這個網站上的列表：https://online.ncat.nsw.gov.au/Hearing/HearingList.aspx?LocationCode=2000 無法提取與匹配的類或ID的所有跨度

那麼，實際上它最終會運行每個LocationCode，但這是一個示例頁面。

我想提取每個日期的<span>標題和table數據。

數據的一般形式是：

<span id="lblSubHeader1242017" class="clsGridItem">1:15 PM Wednesday, 12 Apr 2017 at Room 15.6 Level 15, 66 Goulburn st </span> 
<hr /> 
<table id="dg1242017"> 
    <tr class="clsGridItem"> 
     <td width="15%">RT 17/11111</td> 
     <td width="30%">Name of party</td> 
     <td width="55%">Name of party</td> 
    </tr> 
    ... 
</table>

這是粗糙，但我可以抓住表中的數據相當不錯與形式的代碼：

page = requests.get('https://online.ncat.nsw.gov.au/Hearing/HearingList.aspx?LocationCode=2000') 
tree = html.fromstring(page.content) 
events = tree.xpath('//table//td/text()')

但是當我試圖搶在表外的跨度，所以我可以有地點和日期信息的東西，如：

days = tree.xpath('//span[starts-with(@id,"lbl")]/text()')

或

days = tree.xpath('//span[@class,"clsGridItem"]/text()')

我只得到了以下兩個結果：

days: ['There are no matters listed in SYDNEY today', 'There are no matters listed in SYDNEY today']

這指的是兩個跨度約的話了頁面2/3：

<span id="lbl1442017" style="font-weight:bold;">SYDNEY: Friday, 14 Apr 2017</span><br /><br /><span id="lblError1442017" class="clsGridItem">There are no matters listed in SYDNEY today</span><br /><br /><br /><span id="lbl1742017" style="font-weight:bold;">SYDNEY: Monday, 17 Apr 2017</span><br /><br /><span id="lblError1742017" class="clsGridItem">There are no matters listed in SYDNEY today</span>

誰能解釋我做錯了什麼？

爲什麼其他跨度被跳過？

來源

2017-04-07 raf

您可以使用下面的代碼來獲取<span class="clsGridItem">每個文本內容：

days = tree.xpath('//span[@class="clsGridItem"]//text()')

但我不知道爲什麼//span[@class="clsGridItem"]/text()不工作，因爲它should be applicable as well...

來源

2017-04-07 07:30:25 Andersson

無法提取與匹配的類或ID的所有跨度

回答

相關問題