2009-09-02 115 views
1

我已經將一堆markdown格式的註釋備份到XML文檔中。這當然意味着我需要HTMLescape他們。當我嘗試使用CGI.unescapeHTML時,它會在標記中添加一些奇怪的字符,而這些字符在所有瀏覽器中都不能很好地呈現。紅寶石CGI.unescapeHTML生成奇怪的字符

具體而言,它用「\ 302 \ 240」替換兩個空格,但不一致。我如何才能阻止這種行爲?

如:

s = "I am seeing more and more <a href="http://github.com/aslakhellesoy/cucumber /tree/master">Cucumber</a> usage.  This is a good thing!  But I'm also seeing people who are not using regular expressions to their fullest.  Here are some quick regex tips to keep you features readable:

* `(?:a|an)` -- using a this construct you can group things wihout actually matching them.  I'm seeing a lot of steps that have unused params because someone needed a group but didn't know how to avoid capturing it&#x000A" 
CGI.unescapeHTML s 
# => "I am seeing more and more <a href=\"http://github.com/aslakhellesoy/cucumber/tree/master\">Cucumber</a> usage.\302\240 This is a good thing!\302\240 But I'm..." 
+0

你使用的是什麼版本?我沒有在1.8.7上看到它。 – 2009-09-02 21:01:46

+0

我發現這是由Haml將 個字符作爲空格引起的。它在這裏工作,因爲SO格式化解決了問題。儘管如此,還是有幾個小時的工作要做。 – 2009-09-03 18:55:03

回答

0

這些都是非中斷空格。 Read up on wikipedia.

In computer-based text processing and digital typesetting, a 
non-breaking space, also known as a no-break space or 
non-breakable space (NBSP), is a variant of the space character 
that prevents an automatic line break (line wrap) at its position. 
In certain formats (such as HTML), it also prevents the 
「collapsing」 of multiple consecutive whitespace characters into a 
single space. The non-breaking space is also known as a hard space 
or fixed space. In Unicode, it is encoded as U+00A0 no-break space 
(HTML: &#160; &nbsp;).