Python，請求，線程，python請求關閉其套接字的速度有多快？

我正在嘗試使用Python請求進行操作。這裏是我的代碼：Python，請求，線程，python請求關閉其套接字的速度有多快？

import threading 
import resource 
import time 
import sys 

#maximum Open File Limit for thread limiter. 
maxOpenFileLimit = resource.getrlimit(resource.RLIMIT_NOFILE)[0] # For example, it shows 50. 

# Will use one session for every Thread. 
requestSessions = requests.Session() 
# Making requests Pool bigger to prevent [Errno -3] when socket stacked in CLOSE_WAIT status. 
adapter = requests.adapters.HTTPAdapter(pool_maxsize=(maxOpenFileLimit+100)) 
requestSessions.mount('http://', adapter) 
requestSessions.mount('https://', adapter) 

def threadAction(a1, a2): 
    global number 
    time.sleep(1) # My actions with Requests for each thread. 
    print number = number + 1 

number = 0 # Count of complete actions 

ThreadActions = [] # Action tasks. 
for i in range(50): # I have 50 websites I need to do in parallel threads. 
    a1 = i 
    for n in range(10): # Every website I need to do in 3 threads 
     a2 = n 
     ThreadActions.append(threading.Thread(target=threadAction, args=(a1,a2))) 


for item in ThreadActions: 
    # But I can't do more than 50 Threads at once, because of maxOpenFileLimit. 
    while True: 
     # Thread limiter, analogue of BoundedSemaphore. 
     if (int(threading.activeCount()) < threadLimiter): 
      item.start() 
      break 
     else: 
      continue 

for item in ThreadActions: 
    item.join()

但事實是，經過我得到50個線程時，該Thread limiter開始等待一些線程完成其工作。這是問題。在scrit前往限制器後，lsof -i|grep python|wc -l顯示遠遠少於50個活動連接。但是在限制器之前它已經顯示了所有的< = 50個過程。這是爲什麼發生？或者我應該使用requests.close（）而不是requests.session（）來阻止它使用已經運行的套接字？

來源

2016-10-01 passwd

您的線程限制器進入一個緊密的循環，並消耗大部分處理時間。嘗試像「睡眠（.1）」這樣的放慢速度。更好的是，使用限制爲50個請求的隊列，讓你的線程讀取這些請求。 – tdelaney

關於增加用戶操作系統的限制，請查找[ulimit]（http://stackoverflow.com/questions/6774724/why-python-has-limit-for-count-of-file-handles）和[fs .file-MAX]（https://cs.uwaterloo.ca/~brecht/servers/openfiles.html）。在這樣做之後，在增加python內部的限制時，請查找[setrlimit]（https://coderwall.com/p/ptq7rw/increase-open-files-limit-and-drop-privileges-in-python）。當然，確保你沒有不必要地運行busy-while-loop並且正確地複用你的代碼。 – blackpen

是的，我明白，並在我使用BoundedSemaphore的真實腳本。但是爲什麼在腳本達到極限之後，lsof -i | grep python | wc -l'顯示的數字要低得多？ – passwd

您的限制器是一個緊密的循環，佔用了大部分處理時間。使用線程池來限制工作人員數量。

import multiprocessing.pool 

# Will use one session for every Thread. 
requestSessions = requests.Session() 
# Making requests Pool bigger to prevent [Errno -3] when socket stacked in CLOSE_WAIT status. 
adapter = requests.adapters.HTTPAdapter(pool_maxsize=(maxOpenFileLimit+100)) 
requestSessions.mount('http://', adapter) 
requestSessions.mount('https://', adapter) 

def threadAction(a1, a2): 
    global number 
    time.sleep(1) # My actions with Requests for each thread. 
    print number = number + 1 # DEBUG: This doesn't update number and wouldn't be 
           # thread safe if it did 

number = 0 # Count of complete actions 

pool = multiprocessing.pool.ThreadPool(50, chunksize=1) 

ThreadActions = [] # Action tasks. 
for i in range(50): # I have 50 websites I need to do in parallel threads. 
    a1 = i 
    for n in range(10): # Every website I need to do in 3 threads 
     a2 = n 
     ThreadActions.append((a1,a2)) 

pool.map(ThreadActons) 
pool.close()

來源

2016-10-01 16:15:50 tdelaney

多處理工作比線程更快嗎？這對處理器負載有何影響？ – passwd

它是一個權衡...和windows不同的是linux。使用多處理時，數據需要在父代和子代之間進行序列化（並且在Windows上，通常情況下，需要序列化更多的上下文，因爲孩子沒有得到父內存空間的克隆），但是您不必擔心通過GIL。更高的CPU和/或更低的數據開銷使得多處理效果更好。但是如果你主要是I/O綁定的話，線程池就可以。 – tdelaney

Python，請求，線程，python請求關閉其套接字的速度有多快？

回答

相關問題