你這是怎麼魚...
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<html>
<body>
<a href="/cgi-bin/dcdev/forms/C00508200/800329/">
<a href="/cgi-bin/dcdev/forms/C00487363/800634/">
<a href="/cgi-bin/dcdev/forms/C00498097/800463/">
</body>
</html>
EOT
hrefs = doc.search('a').map{ |a| a['href'] + '/sa/ALL' }
機械化採用引入nokogiri內部爲其HTML解析器。您可以訪問到doc
機械化的東西,如使用:我們正在處理一個文檔引入nokogiri
require 'mechanize'
agent = Mechanize.new
page = agent.get('http://www.example.net')
證明:
page.parser.class # => Nokogiri::HTML::Document < Nokogiri::XML::Document
獲取頁面中的鏈接進行操縱:
page.parser.search('a').map(&:to_html)
其中返回:
[
[ 0] "<a href=\"/\"><img src=\"/_img/iana-logo-pageheader.png\" alt=\"Homepage\"></a>",
[ 1] "<a href=\"/domains/\">Domains</a>",
[ 2] "<a href=\"/numbers/\">Numbers</a>",
[ 3] "<a href=\"/protocols/\">Protocols</a>",
[ 4] "<a href=\"/about/\">About IANA</a>",
[ 5] "<a href=\"/go/rfc2606\">RFC 2606</a>",
[ 6] "<a href=\"/about/\">About</a>",
[ 7] "<a href=\"/about/presentations/\">Presentations</a>",
[ 8] "<a href=\"/about/performance/\">Performance</a>",
[ 9] "<a href=\"/reports/\">Reports</a>",
[10] "<a href=\"/domains/\">Domains</a>",
[11] "<a href=\"/domains/root/\">Root Zone</a>",
[12] "<a href=\"/domains/int/\">.INT</a>",
[13] "<a href=\"/domains/arpa/\">.ARPA</a>",
[14] "<a href=\"/domains/idn-tables/\">IDN Repository</a>",
[15] "<a href=\"/protocols/\">Protocols</a>",
[16] "<a href=\"/numbers/\">Number Resources</a>",
[17] "<a href=\"/abuse/\">Abuse Information</a>",
[18] "<a href=\"http://www.icann.org/\">Internet Corporation for Assigned Names and Numbers</a>",
[19] "<a href=\"mailto:[email protected]?subject=General%20website%20feedback\">[email protected]</a>"
]
拼搶和改寫(munging)其中:
links = page.parser.search('a').map{ |a| a['href'] + 'sa/ALL' }
[
[ 0] "/sa/ALL",
[ 1] "/domains/sa/ALL",
[ 2] "/numbers/sa/ALL",
[ 3] "/protocols/sa/ALL",
[ 4] "/about/sa/ALL",
[ 5] "/go/rfc2606sa/ALL",
[ 6] "/about/sa/ALL",
[ 7] "/about/presentations/sa/ALL",
[ 8] "/about/performance/sa/ALL",
[ 9] "/reports/sa/ALL",
[10] "/domains/sa/ALL",
[11] "/domains/root/sa/ALL",
[12] "/domains/int/sa/ALL",
[13] "/domains/arpa/sa/ALL",
[14] "/domains/idn-tables/sa/ALL",
[15] "/protocols/sa/ALL",
[16] "/numbers/sa/ALL",
[17] "/abuse/sa/ALL",
[18] "http://www.icann.org/sa/ALL",
[19] "mailto:[email protected]?subject=General%20website%20feedbacksa/ALL"
]
可鏈接到你的應用來改寫(munging)爲您確定和如何重新讓他們對你的鍛鍊。
謝謝!這工作。 – piao1780