2012-02-11 87 views
3

我仍然在學習Python,並且作爲一個小項目,我編寫了一個腳本,該腳本將我在文本文件中的值插入到sqlite3數據庫中。但有些名字有奇怪的字母(我想你會稱它們爲非ASCII),並在出現時產生錯誤。這是我的小腳本(並請告訴我,如果有無論如何它可能是更Python): 進口sqlite3的將unicode插入sqlite?

f = open('complete', 'r') 
fList = f.readlines() 
conn = sqlite3.connect('tpb') 
cur = conn.cursor() 

for i in fList: 
    exploaded = i.split('|') 
    eList = (
     (exploaded[1], exploaded[5]) 
    ) 
    cur.execute('INSERT INTO magnets VALUES(?, ?)', eList) 
    conn.commit() 
cur.close() 

並生成該錯誤:

Traceback (most recent call last): 
    File "C:\Users\Admin\Desktop\sortinghat.py", line 13, in <module> 
    cur.execute('INSERT INTO magnets VALUES(?, ?)', eList) 
sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a te 
xt_factory that can interpret 8-bit bytestrings (like text_factory = str). It is 
highly recommended that you instead just switch your application to Unicode str 
ings. 
+0

也許這可能幫助: [錯誤插入(Unicode的?)文本的sqlite3數據庫] [1] [1]:http://stackoverflow.com/questions/4939169/error-inserting-unicode-text-into-sqlite3-database – keno 2012-02-11 05:54:38

回答

4

要獲取文件的內容爲unicode,你需要從它所在的編碼中解碼出來。
它看起來像你在Windows上,所以一個好的選擇是cp1252
如果您從其他地方獲得該文件,則所有投注都將關閉。

一旦編碼排序,一個簡單的方法來解碼是使用codecs模塊,如:

import codecs 
# ... 
with codecs.open('complete', encoding='cp1252') as fin: # or utf-8 or whatever 
    for line in fin: 
    to_insert = (line.split('|')[1], line.split('|')[5]) 
    cur.execute('INSERT INTO magnets VALUES (?,?)', to_insert) 
    conn.commit() 
# ... 
+0

它完美,太感謝你了!謝謝你向我展示'with'聲明,從未想過它! – 2012-02-11 06:23:17

+0

通過包含代碼和完整的回溯,您變得輕而易舉。歡呼,歡迎來到SO。 – bernie 2012-02-11 08:00:57