在Python中動態載入線程類

我的新問題有在最後的代碼，通過該目錄中的模塊，並加載它們迭代dynmaically做：

modules = pkgutil.iter_modules(path=[os.path.join(path,'scrapers')]) 
for loader, mod_name, ispkg in modules: 
    # Ensure that module isn't already loaded, and that it isn't the parent class 
    if (mod_name not in sys.modules) and (mod_name != "Scrape_BASE"): 
     # Import module 
     loaded_mod = __import__('scrapers.'+mod_name, fromlist=[mod_name]) 
     # Load class from imported module. Make sure the module and the class are named the same 
     class_name = mod_name 
     loaded_class = getattr(loaded_mod, class_name) 
     # only instantiate subclasses of Scrape_BASE 
     if(issubclass(loaded_class,Scrape_BASE.Scrape_BASE)): 
      # Create an instance of the class and run it 
      instance = loaded_class() 
      instance.start() 
      instance.join() 
      text = instance.GetText()

在大多數我從網站閱讀PDF類的，抓取內容並設置GetText（）隨後返回的文本。

在某些情況下，PDF太大，我最終會出現分段錯誤。有沒有辦法在3分鐘左右之後監視線程以使其超時？有沒有人有關於我如何實現這一點的建議？

來源

2014-10-11 Sid Kwakkel

right這樣做的方法是更改那些未顯示給我們的類中的代碼，以便它們不會永久運行。如果可能的話，你一定要這樣做。如果你想要超時的是「從網站上閱讀PDF」，那幾乎肯定是可以的。

但有時候，這是不可能的;有時你只是在調用一些沒有超時的C函數。那麼，你對此做了什麼？

那麼，線程不能被打斷。所以你需要改用流程。 multiprocessing.Process與threading.Thread非常相似，只是它在子進程中運行代碼，而不是在同一進程中的線程。

這意味着你不能與你的員工共享任何全球數據而不明確，但這通常是好的事情。然而，它意味着輸入數據（在這種情況下似乎不是什麼）和輸出（這似乎是一個大字符串）必須是可選擇的，並且明確地通過隊列。這很容易做到;詳情請閱讀Exchanging objects between processes部分。

雖然我們在這樣做，但您可能需要考慮重新思考您的設計，以根據任務而不是線程進行思考。如果你有200個PDF文件可供下載，你不需要200個線程;你可能需要8或12個線程，全部服務於200個工作的隊列。 multiprocessing模塊支持進程池，但您可能會發現concurrent.futures更適合此操作。 multiprocessing.Pool和concurrent.futures.ProcessPoolExecutor都可以讓你傳遞一個函數和一些參數，然後等待結果，而不必擔心調度或隊列或其他任何事情。

來源

2014-10-13 07:58:21 abarnert

很多偉大的想法在這裏。我會研究一些，看看他們能否幫助我， – 2015-03-06 22:33:12

在Python中動態載入線程類

回答

相關問題