我找到了解決方案。首先,我創建了一個新的編解碼器錯誤處理程序,然後使用修補程序ElementTree._get_writer()來使用新的錯誤處理程序。看起來像:
from xml.etree import ElementTree
import io
import contextlib
import codecs
def lower_first(s):
return s[:1].lower() + s[1:] if s else ''
def html_replace(exc):
if isinstance(exc, (UnicodeEncodeError, UnicodeTranslateError)):
s = []
for c in exc.object[exc.start:exc.end]:
s.append('&#%s;' % lower_first(hex(ord(c))[1:].upper()))
return ''.join(s), exc.end
else:
raise TypeError("can't handle %s" % exc.__name__)
codecs.register_error('html_replace', html_replace)
# monkey patch this python function to prevent it from using xmlcharrefreplace
@contextlib.contextmanager
def _get_writer(file_or_filename, encoding):
# returns text write method and release all resources after using
try:
write = file_or_filename.write
except AttributeError:
# file_or_filename is a file name
if encoding == "unicode":
file = open(file_or_filename, "w")
else:
file = open(file_or_filename, "w", encoding=encoding,
errors="html_replace")
with file:
yield file.write
else:
# file_or_filename is a file-like object
# encoding determines if it is a text or binary writer
if encoding == "unicode":
# use a text writer as is
yield write
else:
# wrap a binary writer with TextIOWrapper
with contextlib.ExitStack() as stack:
if isinstance(file_or_filename, io.BufferedIOBase):
file = file_or_filename
elif isinstance(file_or_filename, io.RawIOBase):
file = io.BufferedWriter(file_or_filename)
# Keep the original file open when the BufferedWriter is
# destroyed
stack.callback(file.detach)
else:
# This is to handle passed objects that aren't in the
# IOBase hierarchy, but just have a write method
file = io.BufferedIOBase()
file.writable = lambda: True
file.write = write
try:
# TextIOWrapper uses this methods to determine
# if BOM (for UTF-16, etc) should be added
file.seekable = file_or_filename.seekable
file.tell = file_or_filename.tell
except AttributeError:
pass
file = io.TextIOWrapper(file,
encoding=encoding,
errors='html_replace',
newline="\n")
# Keep the original file open when the TextIOWrapper is
# destroyed
stack.callback(file.detach)
yield file.write
ElementTree._get_writer = _get_writer
這是多麼令人討厭。對不起,我不知道ElementTree足以回答你的問題。 (FWIW,我的電子閱讀器的十進制比十六進制更好,所以我有相反的問題)。如果您沒有找到強制使用十六進制的方法,使用正則表達式很容易將十進制實體轉換爲十六進制。 OTOH,在當今時代,大多數設備都具有良好的UTF-8支持,因此您可以將這些實體轉換爲Unicode,並將輸出文件編碼爲UTF-8。 –
我不想用不同的編碼或不同的代碼點修改數據庫文件的格式。我希望它保持與Rhytmbox的格式完全兼容。 – moorepants
這是有道理的。 OTOH,如果Rhythmbox不爲其XML文件使用UTF-8,我會感到驚訝。當然,ASCII是UTF-8的一個子集,因此,即使Rhythmbox支持UTF-8,也可以使您的XML成爲嚴格的ASCII碼。 –