如何在Python中將擴展ASCII轉換爲HTML實體名稱？

我目前做這與他們的HTML實體數等同替換擴展ASCII字符：如何在Python中將擴展ASCII轉換爲HTML實體名稱？

s.encode('ascii', 'xmlcharrefreplace')

我想什麼做的是轉換爲HTML實體名稱等同於（即©代替的©）。下面的這個小程序顯示了我正在嘗試做的失敗。有沒有辦法做到這一點，除了做一個查找/替換？

#coding=latin-1 

def convertEntities(s): 
    return s.encode('ascii', 'xmlcharrefreplace') 

ok = 'ascii: [email protected]#$%^&*()<>' 
not_ok = u'extended-ascii: ©®°±¼' 

ok_expected = ok 
not_ok_expected = u'extended-ascii: &copy;&reg;&deg;&plusmn;&frac14;' 

ok_2 = convertEntities(ok) 
not_ok_2 = convertEntities(not_ok) 

if ok_2 == ok_expected: 
    print 'ascii worked' 
else: 
    print 'ascii failed: "%s"' % ok_2 

if not_ok_2 == not_ok_expected: 
    print 'extended-ascii worked' 
else: 
    print 'extended-ascii failed: "%s"' % not_ok_2

來源

2010-07-22 Jason Coon

更新這是我跟去，用小補丁檢查的entitydefs包含我們有字符映射解決方案。

def convertEntities(s): 
    return ''.join([getEntity(c) for c in s]) 

def getEntity(c): 
    ord_c = ord(c) 
    if ord_c > 127 and ord_c in htmlentitydefs.codepoint2name: 
     return "&%s;" % htmlentitydefs.codepoint2name[ord_c] 
    return c

來源

2010-07-22 20:13:40

編輯

其他人所說的，我從來不知道的htmlentitydefs。它會這樣與我的代碼：

from htmlentitydefs import entitydefs as symbols 

for tag, val in symbols.iteritems(): 
    mystr = mystr.replace("&{0};".format(tag), val)

而且應該工作。

來源

2010-07-22 20:00:20

這就是爲什麼我說「除了查找/替換」，換句話說，我不想構建128個字符的字典。這個解決方案適用於我發佈的代碼，雖然 – 2010-07-22 20:03:25

嗯，我只是適應我的代碼使用其他人提到的'htmlentitydefs'。現在你不必編譯它:) – 2010-07-22 20:09:51

看起來更好...可能需要添加一些ASCII碼檢查，因爲我不想用「<」替換爲「<」 – 2010-07-22 20:17:45

是htmlentitydefs你想要什麼？

import htmlentitydefs 
htmlentitydefs.codepoint2name.get(ord(c),c)

來源

2010-07-22 20:01:49

是的，這就是我正在尋找....不完全那裏。我想我有一個基於此的解決方案 – 2010-07-22 20:12:12

我不知道如何直接，但我認爲htmlentitydefs模塊將使用。一個例子可以發現here。

來源

2010-07-22 20:03:43 SiggyF

確定您不希望轉換是可逆的嗎？您的ok_expected字符串表示您不希望現有的&字符被轉義，因此轉換將是單向的。下面的代碼假設&應該被轉義，但是如果你真的不想要這個，就刪除cgi.escape。

無論如何，我會將您的原始方法與正則表達式替換結合起來：像以前一樣編碼，然後修正數字實體。這樣你就不會通過你的getEntity函數映射每一個字符。

#coding=latin-1 
import cgi 
import re 
import htmlentitydefs 

def replace_entity(match): 
    c = int(match.group(1)) 
    name = htmlentitydefs.codepoint2name.get(c, None) 
    if name: 
     return "&%s;" % name 
    return match.group(0) 

def convertEntities(s): 
    s = cgi.escape(s) # Remove if you want ok_expected to pass! 
    s = s.encode('ascii', 'xmlcharrefreplace') 
    s = re.sub("&#([0-9]+);", replace_entity, s) 
    return s 

ok = 'ascii: [email protected]#$%^&*()<>' 
not_ok = u'extended-ascii: ©®°±¼' 

ok_expected = ok 
not_ok_expected = u'extended-ascii: &copy;&reg;&deg;&plusmn;&frac14;' 

ok_2 = convertEntities(ok) 
not_ok_2 = convertEntities(not_ok) 

if ok_2 == ok_expected: 
    print 'ascii worked' 
else: 
    print 'ascii failed: "%s"' % ok_2 

if not_ok_2 == not_ok_expected: 
    print 'extended-ascii worked' 
else: 
    print 'extended-ascii failed: "%s"' % not_ok_2

來源

2010-07-22 20:43:54 Duncan

如何在Python中將擴展ASCII轉換爲HTML實體名稱？

回答

相關問題