2010-04-06 79 views
3

我試圖解析,而不是評估,在Hpricot/Nokogiri類型的方式軌道ERB文件。我試圖解析的文件包含與使用ERB(標準導軌視圖文件)生成的動態內容混合在一起的HTML片段我正在尋找一個庫,它不僅可以解析周圍的內容,而且可以像Hpricot或Nokogiri那樣處理,但也可以處理ERB符號,<%,<%=等,就好像它們是html/xml標籤。庫來解析ERB文件

理想情況下,我會找回像結構的DOM,其中<%,<%= etc符號將被包括爲它們自己的節點類型。

我知道有可能使用正則表達式在一起破解一些東西,但我正在尋找一些更可靠的東西,因爲我正在開發一個工具,我需要在非常大的視圖代碼庫上運行,其中html內容和erb的內容很重要。

例如,內容如:

 
blah blah blah 
<div>My Great Text <%= my_dynamic_expression %></div> 

會返回一個樹狀結構:

 
root 
- text_node (blah blah blah) 
- element (div) 
    - text_node (My Great Text) 
     - erb_node (<%=) 

回答

3

我終於結束了使用RLex,http://raa.ruby-lang.org/project/ruby-lex/,法的紅寶石版本解決這個問題用以下語法:

 
%{ 

#define NUM 257 

#define OPTOK 258 
#define IDENT 259 
#define OPETOK 260 
#define CLSTOK 261 
#define CLTOK 262 
#define FLOAT 263 
#define FIXNUM 264 
#define WORD 265 
#define STRING_DOUBLE_QUOTE 266 
#define STRING_SINGLE_QUOTE 267 

#define TAG_START 268 
#define TAG_END 269 
#define TAG_SELF_CONTAINED 270 
#define ERB_BLOCK_START 271 
#define ERB_BLOCK_END 272 
#define ERB_STRING_START 273 
#define ERB_STRING_END 274 
#define TAG_NO_TEXT_START 275 
#define TAG_NO_TEXT_END 276 
#define WHITE_SPACE 277 
%} 

digit [0-9] 
blank [ ] 
letter [A-Za-z] 
name1 [A-Za-z_] 
name2 [A-Za-z_0-9] 
valid_tag_character [A-Za-z0-9"'[email protected]_():/ ] 
ignore_tags style|script 
%% 

{blank}+"\n"     { return [ WHITE_SPACE, yytext ] } 
"\n"{blank}+     { return [ WHITE_SPACE, yytext ] } 
{blank}+"\n"{blank}+     { return [ WHITE_SPACE, yytext ] } 

"\r"     { return [ WHITE_SPACE, yytext ] } 
"\n"   { return[ yytext[0], yytext[0..0] ] }; 
"\t"   { return[ yytext[0], yytext[0..0] ] }; 

^{blank}+  { return [ WHITE_SPACE, yytext ] } 

{blank}+$  { return [ WHITE_SPACE, yytext ] }; 

"" { return [ TAG_NO_TEXT_START, yytext ] } 
"" { return [ TAG_NO_TEXT_END, yytext ] } 
""     { return [ TAG_SELF_CONTAINED, yytext ] } 
"" { return [ TAG_SELF_CONTAINED, yytext ] } 
"" { return [ TAG_START, yytext ] } 
"" { return [ TAG_END, yytext ] } 

"" { return [ ERB_BLOCK_END, yytext ] } 
"" { return [ ERB_STRING_END, yytext ] } 


{letter}+  { return [ WORD, yytext ] } 


\".*\"   { return [ STRING_DOUBLE_QUOTE, yytext ] } 
'.*'     { return [ STRING_SINGLE_QUOTE, yytext ] } 
.   { return [ yytext[0], yytext[0..0] ] } 

%% 

這不是一個完整的語法,但爲我的目的,找到並重新發布文本,它的工作。我把這個語法與這段小代碼結合在一起:

 
    text_handler = MakeYourOwnCallbackHandler.new 

    l = Erblex.new 
    l.yyin = File.open(file_name, "r") 

    loop do 
     a,v = l.yylex 
     break if a == 0 

     if(a < WORD) 
     text_handler.character(v.to_s, a) 
     else 
     case a 
     when WORD 
      text_handler.text(v.to_s) 
     when TAG_START 
      text_handler.start_tag(v.to_s) 
     when TAG_END 
      text_handler.end_tag(v.to_s) 
     when WHITESPACE 
      text_handler.white_space(v.to_s) 
     when ERB_BLOCK_START 
      text_handler.erb_block_start(v.to_s) 
     when ERB_BLOCK_END 
      text_handler.erb_block_end(v.to_s)  
     when ERB_STRING_START 
      text_handler.erb_string_start(v.to_s) 
     when ERB_STRING_END 
      self.text_handler.erb_string_end(v.to_s) 
     when TAG_NO_TEXT_START 
      text_handler.ignorable_tag_start(v.to_s) 
     when TAG_NO_TEXT_END 
      text_handler.ignorable_tag_end(v.to_s) 
     when STRING_DOUBLE_QUOTE 
      text_handler.string_double_quote(v.to_s) 
     when STRING_SINGLE_QUOTE 
      text_handler.string_single_quote(v.to_s) 
     when TAG_SELF_CONTAINED 
      text_handler.tag_self_contained(v.to_s) 
     end 
     end 
    end 
+0

你有與Lexer.rb/Erblex.rb任何麻煩由rlex生成的不完整?我已經在OS X和Ubuntu上試過了,但是生成的詞法分析器RB在一個大的'case'/'when'塊中突然結束。我已經嘗試過'rlex grammar'和'rlex --output LexerClassName grammar',其中'語法'對應於一個名爲'grammar.rl'的文件。我有Ruby 1.8.7。 – 2010-10-22 20:35:43

+0

嗨,莎拉,我確實有這個問題。我向Rlex所有者提交了一個錯誤修復程序。如果您有興趣,我可以向您發送補丁文件,但這是一個錯誤,您必須修復紅外。 – 2010-11-10 19:08:38

+0

你能以某種方式發佈補丁嗎? – user43685 2010-12-17 17:30:31

2

我最近有一個類似的問題。我採用的方法是編寫一個小腳本(erblint.rb)進行字符串替換,將ERB標記(<% %><%= %>)轉換爲XML標記,然後使用Nokogiri進行解析。

請看下面的代碼,看看我的意思是:

#!/usr/bin/env ruby 
require 'rubygems' 
require 'nokogiri' 

# This is a simple program that reads in a Ruby ERB file, and parses 
# it as an XHTML file. Specifically, it makes a decent attempt at 
# converting the ERB tags (<% %> and <%= %>) to XML tags (<erb-disp/> 
# and <erb-eval/> respectively. 
# 
# Once the document has been parsed, it will be validated and any 
# error messages will be displayed. 
# 
# More complex option and error handling is left as an exercise to the user. 

abort 'Usage: erb.rb <filename>' if ARGV.empty? 

filename = ARGV[0] 

begin 
    doc = "" 
    File.open(filename) do |file| 
    puts "\n*** Parsing #{filename} ***\n\n" 
    file.read(nil, s = "") 

    # Substitute the standard ERB tags to convert them to XML tags 
    # <%= ... %> for <erb-disp> ... </erb-disp> 
    # <% ... %> for <erb-eval> ... </erb-eval> 
    # 
    # Note that this won't work for more complex expressions such as: 
    # <a href=<% @some_object.generate_url -%> >link text</a> 
    # Of course, this is not great style, anyway... 
    s.gsub!(/<%=(.+?)%>/m, '<erb-disp>\1</erb-disp>') 
    s.gsub!(/<%(.+?)%>/m, '<erb-eval>\1</erb-eval>') 
    doc = Nokogiri::XML(s) do |config| 
     # put more config options here if required 
     # config.strict 
    end 
    end 

    puts doc.to_xhtml(:indent => 2, :encoding => 'UTF-8') 
    puts "Huzzah, no errors!" if doc.errors.empty? 

    # Otherwise, print each error message 
    doc.errors.each { |e| puts "Error at line #{e.line}: #{e}" } 
rescue 
    puts "Oops! Cannot open #{filename}" 
end 

我已爲這是在Github上一個要點:https://gist.github.com/787145

+0

喜歡它!我實際上把它放在一個html註釋中,以避免ERB上任何事情之間的干擾,這些干擾將會導致HTML代碼中的值' – viniciusnz 2017-08-03 16:13:18