嵌套字典和多處理

-1

我有一本作者詞典，每個作者是一本書的字典，每本書都是一個單詞列表。嵌套字典和多處理

我需要一個多處理場景，其中每個進程都處理某個作者的某本書。

我嘗試使用manager.dict()和manager.list()來實例化字典和列表，但我的字典仍沒有得到填充。

這是主要字典對象的聲明方式。

import multiprocessing 
from multiprocessing import Manager 

manager = Manager() 
allWords = manager.dict()

然後有一個功能read_author它執行任務distribtion

def read_author(author): 
     global allWords 
     allWords[author] = manager.dict() # each author is a dictionary of books 
     jobs = [] 
     for f in os.listdir(auth_dir): 
       p = multiprocessing.Process(target=read_doc, args=(author, auth_dir, f,)) 
       jobs.append(p) 
       p.start() 
     return jobs

這是使我處理的功能。

def read_doc(author_name, author_dir, doc_name): 
     global allWords 
     allWords[author_name][doc_name] = manager.list() 
     # document is loaded in the variable doc and it has a set of words 
     for word in doc.words: 
      allWords[author_name][doc_name].append(word)

文檔從項目Gutenberg txt文件和上述doc目的是使用spacy構造的語法樹。

read_doc實際上涉及到文件樹的解析和提取以及bigrams的計數。爲了簡潔起見，我在代碼示例中跳過了這些部分，但這是計數任務，我想分割多個CPU核心，這就是爲什麼我使用多處理。

來源

2016-12-17 Vahid

你的代碼在哪裏？ –

嘗試編寫一個顯示您正在運行的問題的[MCVE] – pvg

@PedroLobito我添加了一些代碼。你能再看看嗎？ – Vahid

Python多處理指南建議儘可能避免共享狀態。
雖然它不是很清楚爲什麼你的代碼不起作用，我看不出有什麼理由使用Manager和共享狀態。
在此示例中的最後allWords字典中的主要工序組裝在Pool過程中產生的單詞列表：

def read_doc(author_name, doc_name): 
    # document is loaded in the variable doc and it has a set of words 
    return author_name, doc_name, list(doc.words) 

def read_doc_param_gen(authors): 
    for author in authors: 
     auth_dir = deduce_auth_dir(author) 
     for f in os.listdir(auth_dir): 
      yield author, f 

def read_authors(authors): 
    pool = multiprocessing.Pool() 
    allWords = collections.defaultdict(dict) 
    for author_name, doc_name, lst in pool.map(read_doc, read_doc_param_gen(authors)): 
     allWords[author_name][doc_name] = lst 
    return allWords

還有Pool.imap如果您需要更新一些GUI或東西。

來源

2016-12-17 10:25:45 robyschek

嵌套字典和多處理

回答

相關問題