RegEx代碼在理論上有效，但代碼運行時不起作用

我試圖在Ruby中使用此RegEx搜索：<div class="ms3">(\n.*?)+<，但只要我到達最後一個字符「<」，它就完全停止工作。我已經在Rubular中測試了它，RegEx工作得很好，我使用rubymine編寫代碼，但是我也使用Powershell對它進行了測試，結果相同。沒有錯誤信息。當我運行<div class="ms3">(\n.*?)+它打印<div class="ms3">這正是我正在尋找，但只要我添加「<」它什麼都沒有出來。RegEx代碼在理論上有效，但代碼運行時不起作用

我的代碼：

#!/usr/bin/ruby 
# encoding: utf-8 

File.open('ms3.txt', 'w') do |fo| 
    fo.puts File.foreach('input.txt').grep(/<div class="ms3">(\n.*?)+/) 
end

一些什麼，我尋遍：

<div class="ms3"> 
    <span xml:lang="zxx"><span xml:lang="zxx">Still the tone of the remainder of the chapter is bleak. The</span> <span class="See_In_Glossary" xml:lang="zxx">DAY OF THE <span class="Name_Of_God" xml:lang="zxx">LORD</span></span> <span xml:lang="zxx">holds no hope for deliverance (5.16–18); the futility of offering sacrifices unmatched by common justice is once more underlined, and exile seems certain (5.21–27).</span></span> 
    </div> 

    <div class="Paragraph"> 
    <span class="Verse_Number" id="idAMO_5_1" xml:lang="zxx">1</span><span class="scrText">Listen, people of Israel, to this funeral song which I sing over you:</span> 
    </div> 

    <div class="Stanza_Break"></div>

全正則表達式，我需要做的是<div class="ms3">(\n.*?)+<\/div>它拿起第一部分，沒有別的

來源

2014-11-25 Rebs

[！不解析與正則表達式HTML（http://stackoverflow.com/a/ 1732454/418066） – Biffen 2014-11-25 12:44:20

除了[不要用正則表達式解析HTML]（http://stackoverflow.com/a/1732454/418066），你可以省略** multiline **修飾符：'... grep（/。 ../米）'。 – mudasobwa 2014-11-25 12:52:18

我喜歡這種咆哮，並且要牢記它。然而，我不是100％確定我正在做的是解析（如果我明白什麼解析是正確的），我想要做的就是從一個txt文件中提取包含HTML的文本的某些位到另一個，就是這樣。除非這正是解析是什麼？我不試圖處理HTML或修改它，最終的結果將在沒有正則表達式的情況下在另一個程序中工作 – Rebs 2014-11-25 13:25:57

您的問題開始於使用File.foreach('input.txt')，將結果切分爲行。這意味着圖案與每條線分開匹配，因此沒有線與圖案相匹配（根據定義，沒有線在其中間具有\n）。

你應該有更好的運氣讀取整個文本塊，並在其上使用match：

File.read('input.txt').match(/<div class="ms3">(\n.*?)+<\/div>/) 
# => #<MatchData "<div class=\"ms3\">\n <span xml:lang=\"zxx\"> 
# => <span xml:lang=\"zxx\">Still the tone of the remainder of the chapter is bleak. The</span> 
# => <span class=\"See_In_Glossary\" xml:lang=\"zxx\">DAY OF THE 
# => <span class=\"Name_Of_God\" xml:lang=\"zxx\">LORD</span></span> 
# => <span xml:lang=\"zxx\">holds no hope for deliverance (5.16–18); 
# => the futility of offering sacrifices unmatched by common justice is once more 
# => underlined, and exile seems certain (5.21–27).</span></span>\n </div>" 1:"\n ">

來源

2014-11-25 12:53:21

RegEx代碼在理論上有效，但代碼運行時不起作用

回答

相關問題