Django的編碼錯誤從CSV

讀書時當我嘗試運行：Django的編碼錯誤從CSV

import csv 

with open('data.csv', 'rU') as csvfile: 
    reader = csv.DictReader(csvfile) 
    for row in reader: 
    pgd = Player.objects.get_or_create(
     player_name=row['Player'], 
     team=row['Team'], 
     position=row['Position'] 
    )

我的大部分數據的獲取在數據庫中創建，除了一個特定的行。當我的腳本達到此行，我收到錯誤：

ProgrammingError: You must not use 8-bit bytestrings unless you use a 
text_factory that can interpret 8-bit bytestrings (like text_factory = str). 
It is highly recommended that you instead just switch your application to Unicode strings.`

在CSV特定行導致此錯誤是：

>>> row 
{'FR\xed\x8aD\xed\x8aRIC.ST-DENIS', 'BOS', 'G'}

我已經看過了其他類似的線程＃1與相同或相似的問題，但大多數並不特定於在Django中使用Sqlite。有什麼建議？

如果很重要，我通過調用python manage.py shell進入Django shell並複製粘貼它來運行腳本，而不是從命令行調用腳本。

這是堆棧跟蹤我得到：

Traceback (most recent call last): 
    File "<console>", line 4, in <module> 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/csv.py", line 108, in next 
    row = self.reader.next() 
    File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 302, in decode 
    (result, consumed) = self._buffer_decode(data, self.errors, final) 
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcc in position 1674: invalid continuation byte

編輯：我決定與這個詞條只是手動導入到我的數據庫，而不是試圖從我的CSV，根據阿拉斯泰爾·麥科馬克的反饋

閱讀

Based on the output from your question, it looks like the person who made the CSV mojibaked it - it doesn't seem to represent FRÉDÉRIC.ST-DENIS. You can try using windows-1252 instead of utf-8 but I think you'll end up with FRíŠDíŠRIC.ST-DENIS in your database.

來源

2017-09-30 Konrad

的Python 2。 x或3.x？ –

Python 2.x，但這是一個新項目，所以如果切換到3.x將使我的生活更輕鬆，我會這樣做。 – Konrad

編碼球員的名字以UTF-8的播放器名稱中使用.encode('utf-8') 導入CSV

with open('data.csv', 'rU') as csvfile: 
    reader = csv.DictReader(csvfile) 
    for row in reader: 
    pgd = Player.objects.get_or_create(
     player_name=row['Player'].encode('utf-8'), 
     team=row['Team'], 
     position=row['Position'] 
    )

來源

2017-09-30 10:08:07

當我添加編碼時，出現錯誤：'UnicodeDecodeError：'ascii'編解碼器無法解碼位置2中的字節0xcc：序號不在範圍內（128）。' – Konrad

這是因爲該文件已經是8位編碼。 '。編碼（）'在這裏沒有意義 –

我懷疑你正在使用Python 2 - open()返回str，它們只是字節串。

錯誤是告訴你，你需要將解碼爲你的文本在使用前轉換爲Unicode字符串。

最簡單的方法是將每個單元進行解碼：

with open('data.csv', 'r') as csvfile: # 'U' means Universal line mode and is not necessary 
    reader = csv.DictReader(csvfile) 
    for row in reader: 
    pgd = Player.objects.get_or_create(
     player_name=row['Player'].decode('utf-8), 
     team=row['Team'].decode('utf-8), 
     position=row['Position'].decode('utf-8) 
    )

這會工作，但它的醜陋加解碼無處不在，它不會在Python工作3 Python 3中通過以文本方式打開文件改進的東西並返回Py2中相當於Unicode字符串的Python 3字符串。

要在Python 2中獲得相同的功能，請使用io模塊。這給你一個open()方法，它有一個encoding選項。煩人，Python的2.x的CSV模塊使用Unicode壞了，所以你需要安裝一個回遷版本：

pip install backports.csv

整理你的代碼，面向未來的，這樣做：

import io 
from backports import csv 

with io.open('data.csv', 'r', encoding='utf-8') as csvfile: 
    reader = csv.DictReader(csvfile) 
    for row in reader: 
    # now every row is automatically decoded from UTF-8 
    pgd = Player.objects.get_or_create(
     player_name=row['Player'], 
     team=row['Team'], 
     position=row['Position'] 
    )

來源

2017-09-30 10:27:21

當我添加解碼時，出現此錯誤：UnicodeDecodeError：'utf8'編解碼器無法解碼位置2中的字節0xcc：無效延續字節。我要去嘗試你的backports想法。 – Konrad

使用backports也不起作用。它給了我錯誤'UnicodeDecodeError：'utf8'編解碼器無法解碼位置1674中的字節0xcc：無效的連續字節在同一個麻煩的記錄上。我也必須使用'從io導入打開' – Konrad

啊，我認爲CSV是UTF-8編碼。 CSV是什麼編碼？ –

Django的編碼錯誤從CSV

回答

相關問題