如何在Python中將「:」這樣的字符轉換爲「：」？

可能重複：
Convert XML/HTML Entities into Unicode String in Python 如何在Python中將「:」這樣的字符轉換爲「：」？

在HTML源，也有像噸的字符「&＃58;」或「&＃46;」（必須在&＃和數字之間加上空格，否則這些字符將被視爲「：」或「。」），我的問題是，你如何將它們轉換成它們應該在python中？有沒有內置的方法或什麼？

希望有人能幫助我。由於

2011-02-18 Shane

我不知道有內置的庫或沒有，但這裏是快速和骯髒的方式做正則表達式

>>> import re 
>>> re.sub("&#(\d+);",lambda x:unichr(int(x.group(1),10)),"&#58; or &#46;") 
u': or .'

來源

2011-02-18 11:52:41 YOU

馬克：謝謝你！ – Shane 2011-02-18 12:42:05

像這樣的事情會處理大多數實體定義（假設的Python 2.x的）。它處理十進制，十六進制和htmlentitydefs中的任何命名實體。

import re 
from htmlentitydefs import name2codepoint 
EntityPattern = re.compile('&(?:#(\d+)|(?:#x([\da-fA-F]+))|([a-zA-Z]+));') 
def decodeEntities(s, encoding='utf-8'): 
    def unescape(match): 
     code = match.group(1) 
     if code: 
      return unichr(int(code, 10)) 
     else: 
      code = match.group(2) 
      if code: 
       return unichr(int(code, 16)) 
      else: 
       code = match.group(3) 
       if code in name2codepoint: 
        return unichr(name2codepoint[code]) 
     return match.group(0) 

    if isinstance(s, str): 
     s = s.decode(encoding) 
    return EntityPattern.sub(unescape, s)

來源

2011-02-18 11:54:15 Duncan

如何在Python中將「:」這樣的字符轉換爲「：」？

回答

相關問題