2016-12-28 77 views
0

我使用python 3.5.2和pytesseract,有一個錯誤TypeError: a bytes-like object is required, not 'str'當我運行我的代碼,(詳情如下):類型錯誤:一類字節對象是必需的,而不是「海峽」在Python 3.5.2和pytesseract

代碼:File "D:/test.py"

# -*- coding: utf-8 -*- 

try: 
    import Image 
except ImportError: 
    from PIL import Image 

import pytesseract 


print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim')) 
print(pytesseract.image_to_string(Image.open('d:/testimages/mobile.gif'))) 

錯誤:

Traceback (most recent call last): 
    File "D:/test.py", line 11, in <module> 
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim')) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 164, in image_to_string 
    errors = get_errors(error_string) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in get_errors 
    error_lines = tuple(line for line in lines if line.find('Error') >= 0) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 112, in <genexpr> 
    error_lines = tuple(line for line in lines if line.find('Error') >= 0) 
TypeError: a bytes-like object is required, not 'str' 

我該怎麼辦?

編輯:

我訓練數據下載到C:\Program Files (x86)\Tesseract-OCR\tessdata,像這樣:

enter image description here

,我插入行error_string = error_string.decode("utf-8")get_errors(),錯誤的是這樣的:

Traceback (most recent call last): 
    File "D:/test.py", line 11, in <module> 
    print(pytesseract.image_to_string(Image.open('d:/testimages/name.gif'), lang='chi_sim')) 
    File "C:\Users\dell\AppData\Local\Programs\Python\Python35\lib\site-packages\pytesseract\pytesseract.py", line 165, in image_to_string 
    raise TesseractError(status, errors) 
pytesseract.pytesseract.TesseractError: (1, 'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\tessdata/chi_sim.traineddata') 

回答

0

這是一個已知的埠克pytesseract,看到issue #32

Error parsing of tesseract output is brittle: a bytes-like object is required, not 'str'

There actually is an error in tesseract. But on the Python end the error occurs because error_string is returning a byte-literal, and the geterrors call appears to have trouble with it

解決辦法是安裝的訓練數據對於一個給定的語言,看到Tesseract running error,或通過編輯site-packages\pytesseract\pytesseract.py並在頂部插入一個額外的行的get_errors()函數(在線路109)的:

error_string = error_string.decode("utf-8") 

然後,該函數讀取:

def get_errors(error_string): 
    ''' 
    returns all lines in the error_string that start with the string "error" 
    ''' 

    error_string = error_string.decode("utf-8") 
    lines = error_string.splitlines() 
    error_lines = tuple(line for line in lines if line.find('Error') >= 0) 
    if len(error_lines) > 0: 
     return '\n'.join(error_lines) 
    else: 
     return error_string.strip() 
+0

它還有一些其他問題,請參閱我的編輯。 – zwl1619

+0

@ zwl1619:我不知道pytessaract是如何工作的。修正編碼錯誤表明訓練數據未按預期方式安裝。錯誤是之前被拋出,但由於編碼問題,你從來沒有得到它。也許這是某種權限問題? –

相關問題