UTF-8在Python日誌中，怎麼樣？

我想使用Python的日誌包將UTF-8編碼的字符串記錄到文件中。作爲一個玩具的例子：UTF-8在Python日誌中，怎麼樣？

import logging 

def logging_test(): 
    handler = logging.FileHandler("/home/ted/logfile.txt", "w", 
            encoding = "UTF-8") 
    formatter = logging.Formatter("%(message)s") 
    handler.setFormatter(formatter) 
    root_logger = logging.getLogger() 
    root_logger.addHandler(handler) 
    root_logger.setLevel(logging.INFO) 

    # This is an o with a hat on it. 
    byte_string = '\xc3\xb4' 
    unicode_string = unicode("\xc3\xb4", "utf-8") 

    print "printed unicode object: %s" % unicode_string 

    # Explode 
    root_logger.info(unicode_string) 

if __name__ == "__main__": 
    logging_test()

這將在logging.info（）調用中使用UnicodeDecodeError爆炸。

在較低的級別上，Python的日誌包使用編解碼器包打開日誌文件，並以「UTF-8」參數作爲編碼傳入。這一切都很好，但它試圖將字節字符串寫入文件而不是unicode對象，該對象會爆炸。從本質上講，Python是這樣做的：

file_handler.write(unicode_string.encode("UTF-8"))

當它應該是這樣：

file_handler.write(unicode_string)

這是Python中的錯誤，還是我瘋服用藥片？ FWIW，這是一個股票Python 2.6安裝。

來源

2009-10-09 Ted Dziuba

您的代碼工作完全正常這裏。我努力讓它失敗，但我沒有成功。 – 2009-10-09 18:33:45

你說的對，python使用UTF-8對它進行編碼，因爲它詢問outfile使用什麼編碼，並且你指定了UTF-8，所以這一切都很好。 – 2009-10-09 18:37:11

我不得不打回來的機器找到[示例]（http://web.archive.org/web/20100107060919/http://tony.czechit.net/2009/02/unicode-support-for-python -logging-library /）你提到的。有趣。 – Epu 2013-04-12 18:41:14

檢查您是否擁有最新的Python 2.6 - 自2.6發佈以來，發現並修復了一些Unicode錯誤。例如，在我的Ubuntu Jaunty系統上，我複製並粘貼了腳本，從日誌文件名中只刪除'/ home/ted /'前綴。結果（複製並從終端窗口粘貼）：

 
[email protected]:~/projects/scratch$ python --version 
Python 2.6.2 
[email protected]:~/projects/scratch$ python utest.py 
printed unicode object: ô 
[email protected]:~/projects/scratch$ cat logfile.txt 
ô 
[email protected]:~/projects/scratch$

在Windows中：

 
C:\temp>python --version 
Python 2.6.2 

C:\temp>python utest.py 
printed unicode object: ô

和文件的內容：

alt text

這也可以解釋爲什麼Lennart Regebro無法複製它。

來源

2009-10-09 19:14:49

是的，就是這樣。在日後的版本中修復了python日誌包中的一個錯誤。 – 2009-10-12 17:15:57

我正在的Python 2.6.1（R261：67515，2010年2月11日，○時51分29秒） [GCC 4.2.1（蘋果公司建立5646）對我的iMac達爾文，我仍然得到相同的錯誤。錯誤真的修復了嗎？ – Tsf 2010-04-07 20:12:10

是的，它發生在2.6.1和2.6.2之間，修訂版爲69448：http://svn.python.org/view?view=rev&revision=69448 - 因此您需要升級到更高版本。 – 2010-04-08 21:12:47

試試這個：

import logging 

def logging_test(): 
    log = open("./logfile.txt", "w") 
    handler = logging.StreamHandler(log) 
    formatter = logging.Formatter("%(message)s") 
    handler.setFormatter(formatter) 
    root_logger = logging.getLogger() 
    root_logger.addHandler(handler) 
    root_logger.setLevel(logging.INFO) 

    # This is an o with a hat on it. 
    byte_string = '\xc3\xb4' 
    unicode_string = unicode("\xc3\xb4", "utf-8") 

    print "printed unicode object: %s" % unicode_string 

    # Explode 
    root_logger.info(unicode_string.encode("utf8", "replace")) 


if __name__ == "__main__": 
    logging_test()

對於什麼是值得我期待必須使用codecs.open打開使用UTF-8編碼的文件，但要麼是默認還是其他什麼東西是怎麼回事，因爲它像這樣工作。

來源

2009-10-09 18:17:29 John

NameError：未定義全局名稱'unicode' – Gank 2016-04-03 16:04:12

@Gank您正在使用python 3我猜 – warvariuc 2017-03-01 18:34:31

如果我明白你的問題正確，同樣的問題應該出現在系統上，當你做到這：

str(u'ô')

我猜的自動編碼到Unix上的區域編碼將無法正常工作，直到你已經啓用locale-通過locale在您的site模塊中識別出if分支setencoding功能。這個文件通常駐留在/usr/lib/python2.x，無論如何它值得檢查。 AFAIK，區域識別setencoding默認是禁用的（這對我的Python 2.6安裝是正確的）。

的選擇是：

讓系統找出正確的方式Unicode字符串編碼爲字節或做在你的代碼（在一些配置站點特定需要site.py）
編碼的Unicode字符串在您的代碼和輸出只是字節

另請參見The Illusive setdefaultencoding通過Ian Bicking和相關鏈接。

來源

2009-10-09 20:24:30

有這樣的代碼：

raise Exception(u'щ')

引起：

File "/usr/lib/python2.7/logging/__init__.py", line 467, in format 
    s = self._fmt % record.__dict__ 
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

這是因爲格式字符串是一個字節的字符串，而一些格式字符串參數都與非ASCII字符unicode字符串：

>>> "%(message)s" % {'message': Exception(u'\u0449')} 
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\u0449' in position 0: ordinal not in range(128)

使格式字符串unicode修復問題：

>>> u"%(message)s" % {'message': Exception(u'\u0449')} 
u'\u0449'

所以，在你的日誌記錄配置使所有格式字符串的Unicode：

'formatters': { 
    'simple': { 
     'format': u'%(asctime)-s %(levelname)s [%(name)s]: %(message)s', 
     'datefmt': '%Y-%m-%d %H:%M:%S', 
    }, 
...

和補丁默認logging格式化使用Unicode格式字符串：

logging._defaultFormatter = logging.Formatter(u"%(message)s")

來源

2014-03-11 08:31:51 warvariuc

Python 3.5呢？不應該所有的字符串都默認爲unicode嗎？ – 2016-12-21 19:45:25

@JanuszSkonieczny你有同樣的問題，與Python 3 – warvariuc 2016-12-22 02:53:45

是的，我在碼頭集裝箱。我通過設置一系列連接到os編碼的env變量來解決它。對於任何人在這裏遇到同樣的問題，請參閱http://stackoverflow.com/a/27931669/260480。 – 2016-12-29 11:28:57

UTF-8在Python日誌中，怎麼樣？

回答

相關問題