一次在python中的多任務

我試圖在給定的數據集中應用與兩個相鄰元素的某些函數。請參考下面的例子。一次在python中的多任務

# I'll just make a simple function here. 
# In my real case, I send request to database 
# to get the result with two arguments. 

def get_data_from_db_with(arg1, arg2): 
    # write a query with arg1 and arg2 named 'query_result' 
    return query_result 

data = [arg1, arg2, arg3, arg4] 
result = [] 
for a, b in zip(data, data[1:]): 
    result.append(get_data_from_db_with(a, b))

例如，如果數據的長度是4如上面看到的情況下，那麼我發送請求3倍至數據庫。每個請求大約需要0.3秒來檢索數據，因此總共需要0.9秒（0.3秒* 3個請求）。問題是，隨着請求數量的增加，總體時間也增加。我想要做的是，如果可能的話，一次發送所有請求。基本上，它會看起來像這樣。

用上述代碼，

1) get_data_from_db_with(arg1, arg2) 
2) get_data_from_db_with(arg2, arg3) 
3) get_data_from_db_with(arg3, arg4)

將被連續處理。

我想要做什麼，如果可能的話，要一次性全部發送請求，不連續。當然，請求數量保持不變。但是總的時間消耗會根據我的假設而降低。

現在我正在尋找異步，多處理等。任何評論或反饋將非常有幫助。

在此先感謝。

來源

2017-04-14 Gee Yeol Nahm

線程可能是你在找什麼。假設get_data_from_db_with的大部分工作都在等待I/O，如調用數據庫。

import threading 

def get_data_from_db_with(arg1, arg2): 
    # write a query with arg1 and arg2 named 'query_result' 
    current_thread = threading.current_thread() 
    current_thread.result = query_result 

data = [arg1, arg2, arg3, arg4] 
threads = [] 
for a, b in zip(data, data[1:]): 
    t = threading.Thread(target=get_data_from_db_with, args=(a,b)) 
    t.start() 
    threads.append(t) 

results = [] 
for t in threads: 
    t.join() 
    results.append(t.result)

注意，該解決方案甚至可以保留在results列表中的順序。

來源

2017-04-14 10:03:03 freakish

感謝您的諮詢！我對使用'threading'有一個問題。據我所知，Python更喜歡通過GIL（全局解釋器鎖定）對多線程進行多處理。我可能是錯的，但只是好奇。 –

@GeeYeolNahm完全取決於你想要做什麼。每次I/O都會發布GIL，所以只要大部分時間你做I/O操作（由於CPU密集型任務），那麼線程就會優先於進程。 – freakish

我試過測試多線程，它工作！平均來說，速度提高2〜3倍。是的，多線程在我的工作環境中處理這個案例。再次感謝，怪異！ –

多處理的替代方法是在查詢構造本身上工作。尋找合併查詢的方法，如(arg1 and arg2) or (arg2 and arg3)...，本質上試圖在一次調用中獲取所有需要的數據。

來源

2017-04-14 09:57:36

感謝您分享您的想法。是的，我沒有搜索發送一個請求，正如你所提到的。我仍在編寫單一查詢並解析結果。我一直在使用[elasticserach multisearch API]（https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html）。最重要的是，我認爲發送一個請求的性能要優於同時發送多個請求！ –

一次在python中的多任務

回答

相關問題