2014-10-08 142 views
7

我打算在我的代碼中使用multiprocessing以獲得更好的性能。我可以在一個類的方法中使用multiprocessing.Pool嗎?

但是,我得到了一個錯誤如下:

Traceback (most recent call last): 
    File "D:\EpubBuilder\TinyEpub.py", line 49, in <module> 
    e.epub2txt() 
    File "D:\EpubBuilder\TinyEpub.py", line 43, in epub2txt 
    tempread = self.get_text() 
    File "D:\EpubBuilder\TinyEpub.py", line 29, in get_text 
    txtlist = pool.map(self.char2text,charlist) 
    File "C:\Python34\lib\multiprocessing\pool.py", line 260, in map 
    return self._map_async(func, iterable, mapstar, chunksize).get() 
    File "C:\Python34\lib\multiprocessing\pool.py", line 599, in get 
    raise self._value 
    File "C:\Python34\lib\multiprocessing\pool.py", line 383, in _handle_tasks 
    put(task) 
    File "C:\Python34\lib\multiprocessing\connection.py", line 206, in send 
    self._send_bytes(ForkingPickler.dumps(obj)) 
    File "C:\Python34\lib\multiprocessing\reduction.py", line 50, in dumps 
    cls(buf, protocol).dump(obj) 
TypeError: cannot serialize '_io.BufferedReader' object 

我已經嘗試過的其他方式得到這個錯誤:

TypeError: cannot serialize '_io.TextIOWrapper' object 

我的代碼如下所示:

from multiprocessing import Pool 
class Book(object): 
    def __init__(self, arg): 
     self.namelist = arg 
    def format_char(self,char): 
     char = char + "a" 
     return char 
    def format_book(self): 
     self.tempread = "" 
     charlist = [f.read() for f in self.namelist] #list of char 
     with Pool() as pool: 
      txtlist = pool.map(self.format_char,charlist) 
     self.tempread = "".join(txtlist) 
     return self.tempread 

if __name__ == '__main__': 
    import os 
    b = Book([open(f) for f in os.listdir()]) 
    t = b.format_book() 
    print(t) 

我認爲錯誤是由於在主函數中沒有使用Pool而引發的。

我的猜想是對的嗎?我怎樣才能修改我的代碼來修復錯誤?

+0

'type(charlist [0])'說什麼?這有點令人困惑,因爲您的錯誤信息與您發佈的代碼不匹配。 ('char2text'與'format_char')。 – 2014-10-08 05:05:41

+0

@JohnZwinck我的真實代碼很長,這裏的代碼簡化了一些。如果它看起來像混淆,我會編輯它.type(charlist [0])是'string' – PaleNeutron 2014-10-08 05:10:49

回答

16

問題是您在Book實例中有一個不可取的實例變量(namelist)。由於您在實例方法上調用pool.map,並且您在Windows上運行,因此需要將整個實例選擇爲可傳遞給子進程。 Book.namelist是一個打開的文件對象(_io.BufferedReader),它不能被酸洗。你可以通過幾種方法解決這個問題。基於示例代碼,它看起來像你可以只讓format_char頂級功能:

def format_char(char): 
    char = char + "a" 
    return char 


class Book(object): 
    def __init__(self, arg): 
     self.namelist = arg 

    def format_book(self): 
     self.tempread = "" 
     charlist = [f.read() for f in self.namelist] #list of char 
     with Pool() as pool: 
      txtlist = pool.map(format_char,charlist) 
     self.tempread = "".join(txtlist) 
     return self.tempread 

但是,如果在現實中,你需要format_char是一個實例方法,你可以使用__getstate__/__setstate__使Book picklable通過酸洗前去除實例的namelist說法:

class Book(object): 
    def __init__(self, arg): 
     self.namelist = arg 

    def __getstate__(self): 
     """ This is called before pickling. """ 
     state = self.__dict__.copy() 
     del state['namelist'] 
     return state 

    def __setstate__(self, state): 
     """ This is called while unpickling. """ 
     self.__dict__.update(state) 

    def format_char(self,char): 
     char = char + "a" 

    def format_book(self): 
     self.tempread = "" 
     charlist = [f.read() for f in self.namelist] #list of char 
     with Pool() as pool: 
      txtlist = pool.map(self.format_char,charlist) 
     self.tempread = "".join(txtlist) 
     return self.tempread 

因爲你並不需要訪問namelist子進程這將是確定的,只要。

+0

謝謝!它現在運行良好,我的猜想是錯誤的。 – PaleNeutron 2014-10-08 05:19:15

相關問題