ruby-on-rails
  • ruby
  • dom
  • parsing
  • nokogiri
  • 2009-05-04 77 views 4 likes 
    4

    什麼是有引入nokogiri選擇開始和停止元素之間的所有內容(包括開始 - /停止元)的最聰明的方法是什麼?引入nokogiri:元素A和B之間選擇內容

    檢查下面的示例代碼來了解我在尋找:

    require 'rubygems' 
    require 'nokogiri' 
    
    value = Nokogiri::HTML.parse(<<-HTML_END) 
        "<html> 
        <body> 
         <p id='para-1'>A</p> 
         <div class='block' id='X1'> 
         <p class="this">Foo</p> 
         <p id='para-2'>B</p> 
         </div> 
         <p id='para-3'>C</p> 
         <p class="that">Bar</p> 
         <p id='para-4'>D</p> 
         <p id='para-5'>E</p> 
         <div class='block' id='X2'> 
         <p id='para-6'>F</p> 
         </div> 
         <p id='para-7'>F</p> 
         <p id='para-8'>G</p> 
        </body> 
        </html>" 
    HTML_END 
    
    parent = value.css('body').first 
    
    # START element 
    @start_element = parent.at('p#para-3') 
    # STOP element 
    @end_element = parent.at('p#para-7') 
    

    結果(返回值)應該是這樣

    <p id='para-3'>C</p> 
    <p class="that">Bar</p> 
    <p id='para-4'>D</p> 
    <p id='para-5'>E</p> 
    <div class='block' id='X2'> 
        <p id='para-6'>F</p> 
    </div> 
    <p id='para-7'>F</p> 
    

    更新:這是我的目前的解決方案,但我認爲必須有一些更聰明:

    @my_content = "" 
    @selected_node = true 
    
    def collect_content(_start) 
    
        if _start == @end_element 
        @my_content << _start.to_html 
        @selected_node = false 
        end 
    
        if @selected_node == true 
        @my_content << _start.to_html 
        collect_content(_start.next) 
        end 
    
    end 
    
    collect_content(@start_element) 
    
    puts @my_content 
    

    回答

    10

    的一種方式太聰明oneliner它使用遞歸:

    def collect_between(first, last) 
        first == last ? [first] : [first, *collect_between(first.next, last)] 
    end 
    

    迭代求解:

    def collect_between(first, last) 
        result = [first] 
        until first == last 
        first = first.next 
        result << first 
        end 
        result 
    end 
    

    編輯:星號(短)解釋

    這就是所謂的摔跤運營商。它「解開」的數組:

    array = [3, 2, 1] 
    [4, array] # => [4, [3, 2, 1]] 
    [4, *array] # => [4, 3, 2, 1] 
    
    some_method(array) # => some_method([3, 2, 1]) 
    some_method(*array) # => some_method(3, 2, 1) 
    
    def other_method(*array); array; end 
    other_method(1, 2, 3) # => [1, 2, 3] 
    
    +0

    感謝您的解決方案,並感謝您的尤伯杯智能遞推的單行!雖然,我不明白什麼是「*」前collect_between的遞歸調用()代表。你能詳細說明一下嗎? – Javier 2009-05-06 08:21:50

    +1

    我已在我的原單回答一個小小的解釋。谷歌周圍「圖示操作」獲取更多:-) – 2009-05-07 19:58:50

    2
    # monkeypatches for Nokogiri::NodeSet 
    # note: versions of these functions will be in Nokogiri 1.3 
    class Nokogiri::XML::NodeSet 
        unless method_defined?(:index) 
        def index(node) 
         each_with_index { |member, j| return j if member == node } 
        end 
        end 
    
        unless method_defined?(:slice) 
        def slice(start, length) 
         new_set = Nokogiri::XML::NodeSet.new(self.document) 
         length.times { |offset| new_set << self[start + offset] } 
         new_set 
        end 
        end 
    end 
    
    # 
    # solution #1: picking elements out of node children 
    # NOTE that this will also include whitespacy text nodes between the <p> elements. 
    # 
    possible_matches = parent.children 
    start_index = possible_matches.index(@start_element) 
    stop_index = possible_matches.index(@end_element) 
    answer_1 = possible_matches.slice(start_index, stop_index - start_index + 1) 
    
    # 
    # solution #2: picking elements out of a NodeSet 
    # this will only include elements, not text nodes. 
    # 
    possible_matches = value.xpath("//body/*") 
    start_index = possible_matches.index(@start_element) 
    stop_index = possible_matches.index(@end_element) 
    answer_2 = possible_matches.slice(start_index, stop_index - start_index + 1) 
    
    2

    爲了完整起見的XPath僅溶液:)
    它建立的兩個集合的交集,啓動元件的下面兄弟姐妹和前述最終元素的兄弟姐妹。

    基本上就可以構建一個路口:

    $a[count(.|$b) = count($b)] 
    

    了幾分變數分爲可讀性:

    @start_element = "//p[@id='para-3']" 
    @end_element = "//p[@id='para-7']" 
    @set_a = "#@start_element/following-sibling::*" 
    @set_b = "#@end_element/preceding-sibling::*" 
    
    @my_content = value.xpath("#@set_a[ count(.|#@set_b) = count(#@set_b) ] 
             | #@start_element | #@end_element") 
    

    兄弟姐妹不包括元素本身,所以開始和結束元素必須單獨包含在表達式中。

    編輯:簡單的解決方案:

    @start_element = "p[@id='para-3']" 
    @end_element = "p[@id='para-7']" 
    @my_content = value.xpath("//*[preceding-sibling::#@start_element and 
               following-sibling::#@end_element] 
             | //#@start_element | //#@end_element") 
    
    相關問題