如何從XML中忽略HTML標記？

我正在使用Ruby 1.8.7並將XML內容作爲API響應的字符串。我想解析此回覆，以便我可以不使用HTML標記：如何從XML中忽略HTML標記？

<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<response>\n <data>\n <publisher_share_percent>0.0</publisher_share_percent>\n <detailed_description>&lt;b&gt;this is the testing detailed&lt;/b&gt; </detailed_description>\n <title>Only &#163;5.00. food (Regular &#163;50.00/90% discount)</title>\n </data>\n <request_id>ed96dd50-3127-012f-3e93-042b2b8686e6</request_id>\n <message>The resource has been created successfully.</message>\n <status>201</status>\n</response>\n

來源

2012-02-04 Bijendra

您可以使用CGI::unescapeHTML。

require 'cgi' 
CGI::unescapeHTML("Usage: foo &quot;bar&quot; &lt;baz&gt;") 
# => "Usage: foo \"bar\" <baz>"

來源

2012-02-04 12:36:33 rubyprince

如果處理XML，因爲它是什麼，XML和使用XML解析器解析它，這項工作變得更加容易：

require 'nokogiri' 

xml = <<EOT 
<?xml version="1.0" encoding="UTF-8"?> 
<response> 
    <data> 
    <publisher_share_percent>0.0</publisher_share_percent> 
    <detailed_description>&lt;b&gt;this is the testing detailed&lt;/b&gt; </detailed_description> 
    <title>Only &#163;5.00. food (Regular &#163;50.00/90% discount)</title> 
    </data> 
    <request_id>ed96dd50-3127-012f-3e93-042b2b8686e6</request_id> 
    <message>The resource has been created successfully.</message> 
    <status>201</status> 
    </response> 
EOT 

doc = Nokogiri::XML(xml) 
puts doc.at('detailed_description').text 
puts doc.at('title').text

保存和運行文件輸出：

ruby ~/Desktop/test2.rb 
<b>this is the testing detailed</b> 
Only £5.00. food (Regular £50.00/90% discount)

來源

2012-02-05 03:21:32

如何從XML中忽略HTML標記？

回答

相關問題