我有一個問題發佈here,我解決了它。在Python中動態載入線程類
我的新問題有在最後的代碼,通過該目錄中的模塊,並加載它們迭代dynmaically做:
modules = pkgutil.iter_modules(path=[os.path.join(path,'scrapers')])
for loader, mod_name, ispkg in modules:
# Ensure that module isn't already loaded, and that it isn't the parent class
if (mod_name not in sys.modules) and (mod_name != "Scrape_BASE"):
# Import module
loaded_mod = __import__('scrapers.'+mod_name, fromlist=[mod_name])
# Load class from imported module. Make sure the module and the class are named the same
class_name = mod_name
loaded_class = getattr(loaded_mod, class_name)
# only instantiate subclasses of Scrape_BASE
if(issubclass(loaded_class,Scrape_BASE.Scrape_BASE)):
# Create an instance of the class and run it
instance = loaded_class()
instance.start()
instance.join()
text = instance.GetText()
在大多數我從網站閱讀PDF類的,抓取內容並設置GetText()隨後返回的文本。
在某些情況下,PDF太大,我最終會出現分段錯誤。有沒有辦法在3分鐘左右之後監視線程以使其超時?有沒有人有關於我如何實現這一點的建議?
很多偉大的想法在這裏。我會研究一些,看看他們能否幫助我, – 2015-03-06 22:33:12