改變鏈接的文本，然後在機械化的Ruby中點擊它們

我設法用Mechanize填寫表單並獲取鏈接列表。結果部分看起來像：改變鏈接的文本，然後在機械化的Ruby中點擊它們

[ 
    #<Mechanize::Page::Link "View" "/cgi-bin/dcdev/forms/C00508200/800329/">, 
    #<Mechanize::Page::Link "View" "/cgi-bin/dcdev/forms/C00487363/800634/">, 
    #<Mechanize::Page::Link "View" "/cgi-bin/dcdev/forms/C00498097/800463/"> 
]

我一直無法弄清楚接下來會發生什麼。

，我需要刮的頁面不是這些鏈接，而是有/sa/ALL在鏈接的結尾，例如：/cgi-bin/dcdev/forms/C00508200/800329/sa/ALL。我如何在每個鏈接的末尾添加sa/ALL？
然後，我怎樣才能點擊每個更正的鏈接，並保存結果頁面？一個循環？

來源

2012-08-23 piao1780

page.links.each do |link| 
    agent.get(link.href + 'sa/ALL').save 
end

來源

2012-08-23 23:32:02 pguardiario

謝謝！這工作。 – piao1780

你這是怎麼魚...

require 'nokogiri' 

doc = Nokogiri::HTML(<<EOT) 
<html> 
    <body> 
    <a href="/cgi-bin/dcdev/forms/C00508200/800329/"> 
    <a href="/cgi-bin/dcdev/forms/C00487363/800634/"> 
    <a href="/cgi-bin/dcdev/forms/C00498097/800463/"> 
    </body> 
</html> 
EOT 

hrefs = doc.search('a').map{ |a| a['href'] + '/sa/ALL' }

機械化採用引入nokogiri內部爲其HTML解析器。您可以訪問到doc機械化的東西，如使用：我們正在處理一個文檔引入nokogiri

require 'mechanize' 

agent = Mechanize.new 
page = agent.get('http://www.example.net')

證明：

page.parser.class # => Nokogiri::HTML::Document < Nokogiri::XML::Document

獲取頁面中的鏈接進行操縱：

page.parser.search('a').map(&:to_html)

其中返回：

[ 
    [ 0] "<a href=\"/\"><img src=\"/_img/iana-logo-pageheader.png\" alt=\"Homepage\"></a>", 
    [ 1] "<a href=\"/domains/\">Domains</a>", 
    [ 2] "<a href=\"/numbers/\">Numbers</a>", 
    [ 3] "<a href=\"/protocols/\">Protocols</a>", 
    [ 4] "<a href=\"/about/\">About IANA</a>", 
    [ 5] "<a href=\"/go/rfc2606\">RFC 2606</a>", 
    [ 6] "<a href=\"/about/\">About</a>", 
    [ 7] "<a href=\"/about/presentations/\">Presentations</a>", 
    [ 8] "<a href=\"/about/performance/\">Performance</a>", 
    [ 9] "<a href=\"/reports/\">Reports</a>", 
    [10] "<a href=\"/domains/\">Domains</a>", 
    [11] "<a href=\"/domains/root/\">Root Zone</a>", 
    [12] "<a href=\"/domains/int/\">.INT</a>", 
    [13] "<a href=\"/domains/arpa/\">.ARPA</a>", 
    [14] "<a href=\"/domains/idn-tables/\">IDN Repository</a>", 
    [15] "<a href=\"/protocols/\">Protocols</a>", 
    [16] "<a href=\"/numbers/\">Number Resources</a>", 
    [17] "<a href=\"/abuse/\">Abuse Information</a>", 
    [18] "<a href=\"http://www.icann.org/\">Internet Corporation for Assigned Names and Numbers</a>", 
    [19] "<a href=\"mailto:[email protected]?subject=General%20website%20feedback\">[email protected]</a>" 
]

拼搶和改寫（munging）其中：

links = page.parser.search('a').map{ |a| a['href'] + 'sa/ALL' } 
[ 
    [ 0] "/sa/ALL", 
    [ 1] "/domains/sa/ALL", 
    [ 2] "/numbers/sa/ALL", 
    [ 3] "/protocols/sa/ALL", 
    [ 4] "/about/sa/ALL", 
    [ 5] "/go/rfc2606sa/ALL", 
    [ 6] "/about/sa/ALL", 
    [ 7] "/about/presentations/sa/ALL", 
    [ 8] "/about/performance/sa/ALL", 
    [ 9] "/reports/sa/ALL", 
    [10] "/domains/sa/ALL", 
    [11] "/domains/root/sa/ALL", 
    [12] "/domains/int/sa/ALL", 
    [13] "/domains/arpa/sa/ALL", 
    [14] "/domains/idn-tables/sa/ALL", 
    [15] "/protocols/sa/ALL", 
    [16] "/numbers/sa/ALL", 
    [17] "/abuse/sa/ALL", 
    [18] "http://www.icann.org/sa/ALL", 
    [19] "mailto:[email protected]?subject=General%20website%20feedbacksa/ALL" 
]

可鏈接到你的應用來改寫（munging）爲您確定和如何重新讓他們對你的鍛鍊。

來源

2012-08-23 19:37:23

謝謝，這個工作很好，但我很難找出如何進一步操縱'搜索'標準。我的標準，正如你可能在上面看到的，是'a'和text ='View'。不過我會繼續嘗試！ – piao1780

改變鏈接的文本，然後在機械化的Ruby中點擊它們

回答

相關問題