在xhtml文檔中查找特定單詞的最快方法

要做到這一點的最快方法是什麼？在xhtml文檔中查找特定單詞的最快方法

我有可能（或可能不）包含單詞「說明」，然後是幾行指令的HTML文件。我想解析這些包含單詞「Instructions」和後續行的頁面。

來源

2009-12-11 rordude

如果您發現單詞「說明」是有跟隨行的固定或可變數量的東西嗎？ – Asaph 2009-12-11 04:39:29

這不是最「正確」的方式，但主要是工作。使用正則表達式來查找字符串：ruby regex

您想要的正則表達式類似於/ instructions（[^ <] +）/。這假設你以<個字符結束。

來源

2009-12-11 04:44:20 Jamie

如果一個文件相匹配，您可以通過只測試開始：

if open('docname.html').read =~ /Instructions/ 
    # Parse to remove the instructions. 
end

我推薦使用角度來說，Hpricot來然後提取你想要的部分 - 這將是或多或少難以取決於你的HTML是如何構成的。如果你想得到一些更具體的幫助，請發佈一些關於結構的更多細節。

來源

2009-12-11 04:54:45 Peter

也許沿着這條線

require 'rubygems' 
require 'nokogiri' 

def find_instructions doc 
    doc.xpath('//body//text()').each do |text| 
    instructions = text.content.select do |line| 
     # flip-flop matches all sections starting with 
     # "Instructions" and ending with an empty line 
     true if (line =~ /Instructions/)..(line =~ /^$/) 
    end 
    return instructions unless instructions.empty? 
    end 
    return [] 
end 

puts find_instructions(Nokogiri::HTML(DATA.read)) 


__END__ 
<html> 
<head> 
    <title>Instructions</title> 
</head> 
<body> 
lorem 
ipsum 
<p> 
lorem 
ipsum 
<p> 
lorem 
ipsum 
<p> 
Instructions 
- Browse stackoverflow 
- Answer questions 
- ??? 
- Profit 

More 
<p> 
lorem 
ipsum 
</body> 
</html>

來源

2009-12-11 10:50:47 akuhn

在xhtml文檔中查找特定單詞的最快方法

回答

相關問題