2012-01-12 46 views
3

我試圖循環遍歷每個元素,但是遇到了下面內部循環的問題。在我看來,xpath模式'*/td'沒有返回任何結果。我期望看到打印到標準輸出的標籤內的數據。我正在使用nokogiri。nokogiri和xpath - 使用數據集嵌套循環

我粘貼到這一點我的rails控制檯:

require 'nokogiri' 
f = File.open("public/index.html") 
doc = Nokogiri::HTML(f) 
f.close 

doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row| 
    puts "row= " + row.to_s 
    row.xpath('*/td').each do |td| 
    puts "td= " + td 
    end 
end 

,這裏是從控制檯輸出:

row= <tr id="208894"> 
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6mdgIY4sPrzAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user1" id="user1" class="details">User 1</a></td> 
<td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td> 
<td headers="WhoIsOnDutyTableLevel1:header:3">0</td> 
</tr> 
row= <tr id="207792"> 
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6AOzsYzBi7dAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user2" id="user2" class="details">User 2</a></td> 
<td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td> 
<td headers="WhoIsOnDutyTableLevel1:header:3">5</td> 
</tr> 
=> 0 

以下是我正在解析HTML:

<table class="duty-report-level1" id="WhoIsOnDutyTableLevel1"> 
<caption></caption> 
<thead> 

<tr> 
<th id="WhoIsOnDutyTableLevel1:header:1" class="duty-report-lt-header">c</th> 
</tr> 
</thead> 
<tfoot></tfoot> 
<tbody> 
<tr> 
<td headers="WhoIsOnDutyTableLevel1:header:1"> 
<table class="duty-report-level2" id="WhoIsOnDutyTableLevel2"> 
<caption></caption> 
<thead> 
<tr> 
<th id="WhoIsOnDutyTableLevel1:header:1">Group Name</th><th id="WhoIsOnDutyTableLevel1:header:2">Group Time Zone</th><th id="WhoIsOnDutyTableLevel1:header:3">Default Devices</th><th id="WhoIsOnDutyTableLevel1:header:4">Supervisors</th> 

</tr> 
</thead> 
<tfoot></tfoot> 
<tbody> 
<tr> 
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/GroupDetails.do;jsessionid=17gaw4aw5pv8s?_data=TJZuNquzHUgWcre8AVcKpAFRUsezgPKzbHn7hwtTf9Ei0C2PJ8QYcKIy8OkorCWT8HDTAzkon1ls%0D%0AefuHC1N%2F0SLQLY8nxBhwesdd7Zeg6NzvCfuzRqLg5g%3D%3D" name="team1" id="team1" class="details">Team 1</a></td><td headers="WhoIsOnDutyTableLevel1:header:2" class="centered-text">US/Pacific</td><td headers="WhoIsOnDutyTableLevel1:header:3" class="centered-text"><img src="/static/images/icon_boolean_false.png" alt="No" border="0"></td><td headers="WhoIsOnDutyTableLevel1:header:4"> 
<values> 
</values><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z7AnuRhH67H6AixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="mgr1" id="mgr1" class="details">Mgr 1</a> 
<br> 








</td> 
</tr> 
<tr> 
<td headers="WhoIsOnDutyTableLevel1:header:1" class="no-padding" colspan="4"> 
<table class="duty-report-level3" id="WhoIsOnDutyTableLevel3"> 
<caption></caption> 
<thead> 
<tr> 
<th id="WhoIsOnDutyTableLevel1:header:1" class="th-left">a</th><th id="WhoIsOnDutyTableLevel1:header:2" class="">b</th> 
</tr> 
</thead> 

<tfoot></tfoot> 
<tbody> 
<tr> 
<td headers="WhoIsOnDutyTableLevel1:header:1" class="no-padding" colspan="2"> 
<table class="duty-report-level4" id="WhoIsOnDutyTableLevel4"> 
<caption></caption> 
<thead> 
<tr> 
<th id="WhoIsOnDutyTableLevel1:header:1">Recipient</th><th id="WhoIsOnDutyTableLevel1:header:2">Category</th><th id="WhoIsOnDutyTableLevel1:header:3">Escalation</th> 
</tr> 
</thead> 
<tfoot></tfoot> 
<tbody> 
<tr id="208894"> 

<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6mdgIY4sPrzAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user1" id="user1" class="details">User 1</a></td><td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td><td headers="WhoIsOnDutyTableLevel1:header:3">0</td> 
</tr> 
<tr id="207792"> 
<td headers="WhoIsOnDutyTableLevel1:header:1"><a href="/alarmpoint/UserDevices.do;jsessionid=17gaw4aw5pv8s?_data=KpBkJeR08z6AOzsYzBi7dAixAYz%2BqH6ZPkanPQ24VqQFpjRFPQiWigQHttJBTMFaCLEBjP6ofpk%2B%0D%0ARqc9DbhWpI1nHAqm8ex%2BxOmu7xYUNxRSU0XUo1xoRw%3D%3D" name="user2" id="user2" class="details">User 2</a></td><td headers="WhoIsOnDutyTableLevel1:header:2">PERSON</td><td headers="WhoIsOnDutyTableLevel1:header:3">5</td> 
</tr> 




</tbody> 
</table> 

</td> 
</tr> 
</tbody> 
</table> 
</td> 
</tr> 
</tbody> 
</table> 
</td> 
</tr> 
</tbody> 
</table> 
+0

對不起,我期待看到打印出的​​標籤內的數據。 – sybind 2012-01-12 19:42:17

回答

5

你需要一個小的改動你的XPath:

doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row| 
    # puts "row= " + row.to_s 
    row.xpath('./td').each do |td| 
    puts "td= " + td.text 
    end 
end 

,輸出:

 
td= User 1 
td= PERSON 
td= 0 
td= User 2 
td= PERSON 
td= 5 

使用./td作爲td中的XPath基本上意味着「從這一點看下來一個「。

就個人而言,除非你絕對需要XPath,否則我推薦使用CSS訪問器。他們更可讀,而且往往要簡單得多:

doc.search('#WhoIsOnDutyTableLevel4 tbody tr').each do |row| 
    row.search('td').each do |td| 
    puts "td= " + td.text 
    end 
end 

我建議使用search代替cssxpathat而不是at_cssat_xpath。當你選擇另一個時,沒有真正的魔法發生,你只需要記住兩種不同的方法。

+0

非常感謝。這促使我堅果 – sybind 2012-01-12 20:20:32

+0

解析XML/HTML需要一點時間,但一旦這樣做,就很容易分解頁面和XML數據。 XPath非常強大,但對我來說看起來像線噪聲,這就是爲什麼我更喜歡CSS。 – 2012-01-12 21:10:46

0

內部循環中的XPath表達式相對於每個tr進行計算,因此您想要使用td(其選擇兒童上下文trtd元素)和不*/td(其選擇孫子td元件)。

全碼:

doc.xpath('//*[@id="WhoIsOnDutyTableLevel4"]/tbody/tr').each do |row| 
    puts "row= " + row.to_s 
    row.xpath('td').each do |td| 
     puts "td= " + td 
    end 
end