2016-04-26 106 views
3

調用CPP功能我想打電話給this cpp function從蟒蛇:分段故障而從Python的

TESS_API BOOL TESS_CALL TessBaseAPIProcessPages(TessBaseAPI* handle, const char* filename, 
    const char* retry_config, int timeout_millisec, TessResultRenderer* renderer) 
{ 
    if (handle->ProcessPages(filename, retry_config, timeout_millisec, renderer)) 
     return TRUE; 
    else 
     return FALSE; 
} 

此函數的最後一個參數是TessResultRenderer。有another cpp function創建TessResultRenderer

TESS_API TessResultRenderer* TESS_CALL TessTextRendererCreate(const char* outputbase) 
{ 
    return new TessTextRenderer(outputbase); 
} 

趁現在從我的蟒蛇叫這個,我做了以下內容:

outputbase = "stdout" 
renderer = tesseract.TessTextRendererCreate(outputbase) 
text_out = tesseract.TessBaseAPIProcessPages(api, 
    ctypes.create_string_buffer(path), 
    None, 0, renderer) //Segmentation fault (core dumped) error on this line 

,但我不斷收到Segmentation fault錯誤。

我的問題是我怎麼能從Python中調用TessBaseAPIProcessPages

一些更多的引用鏈接到代碼庫:

referer api

Implementation of processPages(...)

編輯

嘗試評論建議後,我做了以下,但我得到一個錯誤:item 1 in _argtypes_ has no from_param method

PTessResultRenderer = ctypes.POINTER(TessResultRenderer) 
self.tesseract.TessTextRendererCreate.restype = PTessResultRenderer 
outputbase = "stdout" 
self.tesseract.TessTextRendererCreate.argtypes = [outputbase] #error here 
self.tesseract.TessTextRendererCreate 

ReturnVal = ctypes.c_bool 
self.tesseract.TessBaseAPIProcessPages.argtypes = [self.api, path, None, 0, PTessResultRenderer] 
self.tesseract.TessBaseAPIProcessPages.restype = ReturnVal 
self.tesseracto.TessBaseAPIProcessPages 

class TessResultRenderer(ctypes.Structure): 
    pass 
+1

默認的結果類型是'c_int'。這也是整數參數的默認轉換類型。學習如何設置'restype'和'argtypes'。 – eryksun

+0

@eryksun'TessTextRendererCreate'的結果類型是'new TessTextRenderer'。我知道這些argtypes,但不知道如何在這裏應用它。 – Anthony

+1

使用不透明類型:'class TessResultRenderer(ctypes.Structure):pass'。爲它創建一個指針類型:'PTessResultRenderer = ctypes.POINTER(TessResultRenderer)'。然後設置'tesseract.TessTextRendererCreate.restype = PTessResultRenderer'。 – eryksun

回答

3

在contrib文件夾中有一個使用來自ctypes的tesseract C-API的示例。但它似乎有點過時了。 contrib/tesseract-c_api-demo.py

您需要設置幾個方法的restypeargtypes。另外,不要忘記在處理函數上調用init函數。以下示例適用於我。它從英文文件「test.bmp」中讀取文本到text變量。

from ctypes import * 
from ctypes.util import find_library 

lang = b"eng" 
filename = b"test.bmp" 
TESSDATA_PREFIX = b"/usr/local/Cellar/tesseract/3.04.01_1/share/tessdata" 

path = find_library("libtesseract.dylib") 
tesseract = CDLL(path) 

class TessBaseAPI(Structure): 
    pass 
class TessResultRenderer(Structure): 
    pass 

tesseract.TessBaseAPICreate.restype = POINTER(TessBaseAPI) 
tesseract.TessBaseAPIInit3.argtypes = [POINTER(TessBaseAPI), c_char_p, c_char_p] 
tesseract.TessBaseAPIInit3.restype = c_bool 
tesseract.TessBaseAPIProcessPages.argtypes = [POINTER(TessBaseAPI), c_char_p, c_char_p, c_int, POINTER(TessResultRenderer)] 
tesseract.TessBaseAPIProcessPages.restype = c_bool 
tesseract.TessBaseAPIGetUTF8Text.argtypes = [POINTER(TessBaseAPI)] 
tesseract.TessBaseAPIGetUTF8Text.restype = c_char_p 

api = tesseract.TessBaseAPICreate() 
rc = tesseract.TessBaseAPIInit3(api, TESSDATA_PREFIX, lang); 
if (rc): 
    tesseract.TessBaseAPIDelete(api) 
    print("Could not initialize tesseract.\n") 
    exit(3) 

success = tesseract.TessBaseAPIProcessPages(api, filename, None , 0, None) 

if success: 
    text = tesseract.TessBaseAPIGetUTF8Text(api) 
    print("="*78) 
    print(text.decode("utf-8").strip()) 
    print("="*78) 

輸出看起來是這樣的:

============================================================================== 
This is a lot of 12 point text to test the 
ocr code and see if it works on all types 
of file format. 

The quick brown dog jumped over the 
lazy fox. The quick brown dog jumped 
over the lazy fox. The quick brown dog 
jumped over the lazy fox. The quick 
brown dog jumped over the lazy fox. 
============================================================================== 

編輯:替換使用的c_void_p不透明類型由eryksun建議。謝謝!

+1

'c_void_p'起作用,但它不是類型安全的。您應該利用所有工具來獲取乾淨的Python異常,以避免編程錯誤,而不是使用segfault來崩潰該進程。你能整合我的建議來定義不透明類型嗎?它只是幾行代碼來定義'PTessBaseAPI'和'PTessResultRenderer'來代替'c_void_p'。 – eryksun

+0

@eryksun在閱讀您的評論之前,我還沒有遇到過不透明的類型。感謝啓發! – Snorfalorpagus

+0

@Snorfalorpagus謝謝!這對我有效。另外,感謝評論tesseract回購中的相關問題。我打開另一個問題,我從tesseract命令行獲得不同的結果與API:https://github.com/tesseract-ocr/tesseract/issues/312你知道什麼可能會導致差異? – Anthony

0

當您從數組中運行或者如果您取消引用空指針時會發生分段錯誤。如果您使用調試器,它將引導您完成所有代碼,並向您顯示發生了什麼。