如何搜索單個節點，而不是所有節點

我正在使用XPath選擇器來選擇頁面上的每個項目（大約24），然後我在每個項目上使用XPath選擇器來從每個項目返回值。如何搜索單個節點，而不是所有節點

即使我在子節點上運行XPath選擇器，它似乎是在所有子節點上搜索，我只希望它在每個子節點上單獨完成。

以下代碼搜索doc上的每個項目，然後重複每個html_listing。然後，它把它傳遞給一個get_field_data_from：

def get_listing(doc,field_data = {}) 
    doc.xpath(get_listing_tag[:path]).each do |html_listing| 
    fd = get_field_data_from(html_listing,field_data) 
    if !field_data && fd.detect {|_,data| !data } 
     set_uri doc.xpath(get_sub_page_tag[:path]) 
     get 
     fd = get_listing(Nokogiri::HTML(body),fd) 
    end 
    yield fd 
    end 
end

所以遍歷所有我正在尋找該Fields用於使用

selector = send("get_%s_tag" % field)

如果選擇存在檢索的XPath選擇包含字符串和數據尚未發現它會使用XPath選擇器上的HTML item，存儲使用文本

res[field] = item.xpath(selector[:path]).inner_text

然後返回結果散列以用於下一次迭代。

def get_field_data_from(item,data) 
    Fields.inject(data) do |res,field| 
    selector = send("get_%s_tag" % field) 
    unless !selector || res[field] 
     begin 
     res[field] = item.xpath(selector[:path]).inner_text 
     rescue Exception => e 
     puts "Error for field: %s" % field 
     raise e 
     end 
    end 
    res 
    end 
end

不知怎的，似乎做

res[field] = item.xpath(selector[:path]).inner_text

似乎在搜尋所有項目而不是隻是給定項目清單。我知道它在做什麼，由於：

做：

puts item.xpath(selector[:path]).inner_text

返回不止一個結果

我不是真正遍歷所有html_listings。在那裏它產生的現場數據yield fd在get_listing我做break所以它只做它一次。

我似乎無法弄清楚發生了什麼事。別人看到了嗎？

來源

2016-12-30 Thermatix

您需要錨元件上的XPath查詢：

node.xpath("//example")做了全局搜索
node.xpath(".//example")做本地搜索並從當前節點

注意前導點.它將查詢錨定在當前節點上。否則，即使您從當前節點調用查詢，也會針對根節點運行查詢。

如果您使用標籤名稱進行搜索，請考慮使用CSS選擇器。他們比XPath有更少的陷阱。 CSS總是從當前節點搜索。

來源

2016-12-30 20:06:24 akuhn

即使在子節點上完成？ – Thermatix

唉，是的，xpath很混亂。 – akuhn

wtf？如果子節點仍然要針對所有子節點的兄弟節點運行它，那麼在子節點上執行xpath有什麼意義？這沒有任何意義，任何方式，謝謝你的工作，將在棧讓我回答。 – Thermatix

還有另一個同樣嚴重的問題。

item.xpath(selector[:path]).inner_text

xpath返回一個NodeSet。 inner_text將連接NodeSet中所有節點的結果，從而產生一個通常不會是你想要的字符串。

require 'nokogiri' 

doc = Nokogiri::HTML(<<EOT) 
<html> 
    <body> 
    <p>foo</p> 
    <p>bar</p> 
    </body> 
</html> 
EOT 

doc.search('p').class # => Nokogiri::XML::NodeSet 
doc.search('p').inner_text # => "foobar"

相反，你需要使用map走節點列表，然後獲取文本：

doc.search('p').map(&:inner_text) # => ["foo", "bar"]

，或者爲了簡單：

doc.search('p').map(&:text) # => ["foo", "bar"]

見「How to avoid joining all text from Nodes when scraping」也。

來源

2017-01-06 20:24:04

如何搜索單個節點，而不是所有節點

回答

相關問題