類型錯誤：一類字節對象是必需的，而不是「海峽」在Python 3.5.2和pytesseract

我使用python 3.5.2和pytesseract，有一個錯誤TypeError: a bytes-like object is required, not 'str'當我運行我的代碼，（詳情如下）：類型錯誤：一類字節對象是必需的，而不是「海峽」在Python 3.5.2和pytesseract

代碼：File "D:/test.py"

# -*- coding: utf-8 -*- 

try: 
    import Image 
except ImportError: 
    from PIL import Image 

import pytesseract 


print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim')) 
print(pytesseract.image_to_string(Image.open('d:/testimages/mobile.gif')))

錯誤：

Traceback (most recent call last): 
    File "D:/test.py", line 11, in <module> 
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim')) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 164, in image_to_string 
    errors = get_errors(error_string) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in get_errors 
    error_lines = tuple(line for line in lines if line.find('Error') >= 0) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in <genexpr> 
    error_lines = tuple(line for line in lines if line.find('Error') >= 0) 
TypeError: a bytes-like object is required, not 'str'

我該怎麼辦？

編輯：

我訓練數據下載到C:\Program Files (x86)\Tesseract-OCR\tessdata，像這樣：

，我插入行error_string = error_string.decode("utf-8")到get_errors()，錯誤的是這樣的：

Traceback (most recent call last): 
    File "D:/test.py", line 11, in <module> 
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim')) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 165, in image_to_string 
    raise TesseractError(status, errors) 
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata')

來源

2016-12-28 zwl1619

這是一個已知的埠克pytesseract，看到issue #32：

Error parsing of tesseract output is brittle: a bytes-like object is required, not 'str'

和

There actually is an error in tesseract. But on the Python end the error occurs because error_string is returning a byte-literal, and the geterrors call appears to have trouble with it

解決辦法是安裝的訓練數據對於一個給定的語言，看到Tesseract running error，或通過編輯site-packages\pytesseract\pytesseract.py並在頂部插入一個額外的行的get_errors()函數（在線路109）的：

error_string = error_string.decode("utf-8")

然後，該函數讀取：

def get_errors(error_string): 
    ''' 
    returns all lines in the error_string that start with the string "error" 
    ''' 

    error_string = error_string.decode("utf-8") 
    lines = error_string.splitlines() 
    error_lines = tuple(line for line in lines if line.find('Error') >= 0) 
    if len(error_lines) > 0: 
     return '\n'.join(error_lines) 
    else: 
     return error_string.strip()

來源

2016-12-28 19:51:22

它還有一些其他問題，請參閱我的編輯。 – zwl1619

@ zwl1619：我不知道pytessaract是如何工作的。修正編碼錯誤表明訓練數據未按預期方式安裝。錯誤是之前被拋出，但由於編碼問題，你從來沒有得到它。也許這是某種權限問題？ –

類型錯誤：一類字節對象是必需的，而不是「海峽」在Python 3.5.2和pytesseract

回答

相關問題