2016-07-25 141 views
0
def dowork(): 
    y = [] 
    z = [] 
    ab = 0 
    start_time = time.time() 
    t = threading.current_thread() 

    for x in range(0,1500): 
    y.append(random.randint(0,100000)) 
    for x in range(0,1500): 
    z.append(random.randint(0,1000)) 
    for x in range(0,100): 
    for k in range(0,len(z)): 
     ab += y[k] ** z[k] 
    print(" %.50s..." % ab) 
    print("--- %.6s seconds --- %s" % (time.time() - start_time, t.name)) 

#do the work! 
threads = [] 
for x in range(0,4): #4 threads 
    threads.append(threading.Thread(target=dowork)) 

for x in threads: 
    x.start() # and they are off 

結果:多線程VS單線程計算

23949968699026357507152486869104218631097704347109... 
--- 11.899 seconds --- Thread-2 
10632599432628604090664113776561125984322566079319... 
--- 11.924 seconds --- Thread-4 
20488842520966388603734530904324501550532057464424... 
--- 12.073 seconds --- Thread-1 
17247910051860808132548857670360685101748752056479... 
--- 12.115 seconds --- Thread-3 
[Finished in 12.2s] 

現在讓我們做吧在1線:

def dowork(): 
    y = [] 
    z = [] 
    ab = 0 
    start_time = time.time() 
    t = threading.current_thread() 

    for x in range(0,1500): 
    y.append(random.randint(0,100000)) 
    for x in range(0,1500): 
    z.append(random.randint(0,1000)) 
    for x in range(0,100): 
    for k in range(0,len(z)): 
     ab += y[k] ** z[k] 
    print(" %.50s..." % ab) 
    print("--- %.6s seconds --- %s" % (time.time() - start_time, t.name)) 

# print(threadtest()) 
threads = [] 
for x in range(0,4): 
    threads.append(True) 

for x in threads: 
    dowork() 

結果:

14283744921265630410246013584722456869128720814937... 
--- 2.8463 seconds --- MainThread 
13487957813644386002497605118558198407322675045349... 
--- 2.7690 seconds --- MainThread 
15058500261169362071147461573764693796710045625582... 
--- 2.7372 seconds --- MainThread 
77481355564746169357229771752308217188584725215300... 
--- 2.7168 seconds --- MainThread 
[Finished in 11.1s] 

爲什麼單線程和多線程腳本有相同處理時間? 不應該多線程實現只有1 /#線程少? (我知道當你達到你的最大CPU線程有收益遞減)

我搞砸了我的實施?

+0

的線程不會因CPython的的_Global解釋Lock_的並行運行。這是CPython的一個衆所周知的缺陷。嘗試多處理。 –

+0

好的,如果你給我一個鏈接,或者多處理,請給我一個答案,我會接受它。你已經回答了。 – c3cris

+1

https://docs.python.org/2/library/multiprocessing.html –

回答

2

Python中的多線程不能像其他語言一樣工作,如果我正確調用它,它與global interpreter lock有關。雖然有很多不同的解決方法,例如,您可以使用gevent's coroutine based "threads"。我自己更喜歡dask需要同時運行的工作。例如

import dask.bag as db 
start = time.time() 
(db.from_sequence(range(4), npartitions=4) 
    .map(lambda _: dowork()) 
    .compute()) 
print('total time: {} seconds'.format(time.time() - start)) 

start = time.time() 
threads = [] 
for x in range(0,4): 
    threads.append(True) 

for x in threads: 
    dowork() 
print('total time: {} seconds'.format(time.time() - start)) 

和輸出

19016975777667561989667836343447216065093401859905... 
--- 2.4172 seconds --- MainThread 
32883203981076692018141849036349126447899294175228... 
--- 2.4685 seconds --- MainThread 
34450410116136243300565747102093690912732970152596... 
--- 2.4901 seconds --- MainThread 
50964938446237359434550325092232546411362261338846... 
--- 2.5317 seconds --- MainThread 
total time: 2.5557193756103516 seconds 
10380860937556820815021239635380958917582122217407... 
--- 2.3711 seconds --- MainThread 
13309313630078624428079401365574221411759423165825... 
--- 2.2861 seconds --- MainThread 
27410752090906837219181398184615017013303570495018... 
--- 2.2853 seconds --- MainThread 
73007436394172372391733482331910124459395132986470... 
--- 2.3136 seconds --- MainThread 
total time: 9.256525993347168 seconds 

在這種情況下DASK使用multiprocessing做的工作,這可能會或可能不會desireable對你的情況。

此外,您也可以嘗試使用其他python實現,例如pypy,stackless python等,聲稱它提供解決方法來解決問題。

+0

那麼https://docs.python.org/3/library/multiprocessing.html呢?這將允許沒有GIL的並行性? – c3cris

+0

多處理使用多個進程來實現並行性。然而,與產生一個新線程相比,開始一個新過程可能需要更長的時間,也可能需要設法讓他們相互交流(如果適用)。我建議的解決方案使用'dask.bag',它在場景後面使用'multiprocessing'。 – Jeffrey04

+0

所以多進程在dask.bag下彼此交談,所以我不必?它是通過共享序列化數據完成的嗎?本地套接字?只是好奇。 – c3cris

0

在CPython中,由於Global Intepreter Lock,線程不會並行運行。從Python維基(https://wiki.python.org/moin/GlobalInterpreterLock):

在CPython的,全局解釋鎖或GIL,是防止多個原生線程同時執行Python字節碼互斥。這主要是因爲CPython中的內存管理是不是線程安全的

0

這是一個關於多線程和多處理與單線程/進程的完整測試和示例。

計算,你可以選擇任何你想要的計算。

import time, os, threading, random, multiprocessing 

def dowork(): 
    total = 0 
    start_time = time.time() 
    t = threading.current_thread() 
    p = multiprocessing.current_process() 
    for x in range(0,100): 
    total += random.randint(1000000-1,1000000) ** random.randint(37000-1,37000) 
    print("--- %.6s seconds DONE --- %s | %s" % (time.time() - start_time, p.name, t.name)) 

測試:

t, p = [], [] 
for x in range(0,4): 
    #create thread 
    t.append(threading.Thread(target=dowork)) 
    #create child process 
    p.append(multiprocessing.Process(target=dowork)) 
#multi-thread 
start_time = time.time() 
for l in t: 
    l.start() 

for l in t: 
    l.join() 

print("===== %.6s seconds Multi-Threads =====" % (time.time() - start_time)) 
start_time = time.time() 
#multi-process 
for l in p: 
    l.start() 
for l in p: 
    l.join() 

print("===== %.6s seconds Multi-Processes =====" % (time.time() - start_time)) 
start_time = time.time() 
# Sequential 
for l in p: 
    dowork() 
print("===== %.6s seconds Single Process/Thread =====" % (time.time() - start_time)) 

這裏是輸出示例:

#Sample Output: 

--- 2.6412 seconds DONE --- MainProcess | Thread-1 
--- 2.5712 seconds DONE --- MainProcess | Thread-2 
--- 2.5774 seconds DONE --- MainProcess | Thread-3 
--- 2.5973 seconds DONE --- MainProcess | Thread-4 
===== 10.388 seconds Multi-Threads ===== 
--- 2.4816 seconds DONE --- Process-4 | MainThread 
--- 2.4841 seconds DONE --- Process-3 | MainThread 
--- 2.4965 seconds DONE --- Process-2 | MainThread 
--- 2.5182 seconds DONE --- Process-1 | MainThread 
===== 2.5241 seconds Multi-Processes ===== 
--- 2.4624 seconds DONE --- MainProcess | MainThread 
--- 2.6447 seconds DONE --- MainProcess | MainThread 
--- 2.5716 seconds DONE --- MainProcess | MainThread 
--- 2.4369 seconds DONE --- MainProcess | MainThread 
===== 10.115 seconds Single Process/Thread ===== 
[Finished in 23.1s]