Python：如何使用urllib2和pool.map知道哪個URL失敗？

我想同時調用3個URL並記錄任何錯誤。下面是我的示例代碼：Python：如何使用urllib2和pool.map知道哪個URL失敗？

urls = ["https://example.com/gives200.php", "https://example.com/alsogives200.php", "https://example.com/gives500.php"]; 

try: 
    results = pool.map(urllib2.urlopen, urls); 
except URLError: 
    urllib2.urlopen("https://example.com/log_error/?url="+URLError.url);

我只是想知道哪些URL（如有的話）讓他們稱之爲/log_error/ URL是錯誤的。但是當我有這樣的代碼時，我收到一個錯誤，說沒有定義URLError。

我確實有這些進口在我的代碼的頂部：

import urllib2 
from multiprocessing.dummy import Pool as ThreadPool

這裏是我的全部錯誤響應（這是使用AWS LAMBDA，不管它的價值）

{ 
    "stackTrace": [ 
    [ 
     "/var/task/lambda_function.py", 
     27, 
     "lambda_handler", 
     "except Error as e:" 
    ] 
    ], 
    "errorType": "NameError", 
    "errorMessage": "global name 'URLError' is not defined" 
}

怎麼辦我捕獲了錯誤的URL，所以我知道它們是什麼？

UPDATE

我想通了：在urllib.error類URLError是其中一部分就是：urllib，不urllib2。

本文檔頁面的頂部解釋說：https://docs.python.org/2/library/urllib2.html

這裏是更詳細的HTTPError對象，其實我得到： https://docs.python.org/2/library/urllib2.html#urllib2.HTTPError

示數本身仍然存在，雖然URL的問題......目前我無法確定哪個URL是一個錯誤。

更新2

顯然str(e.url)是我所需要的。我沒有找到任何有關這方面的文件。這完全是我的一個幸運的猜測。

所以這就是現在的工作代碼：

urls = ["https://example.com/gives200.php", "https://example.com/alsogives200.php", "https://example.com/gives500.php"]; 

try: 
    results = pool.map(urllib2.urlopen, urls); 
except Exception as e: 
    urllib2.urlopen("https://example.com/log_error/?url="+str(e.url)+"&code="+str(e.code)+"&reason="+e.reason;

更新3

感謝@mfripp informing me about the dangers of pool.map我已經修改了該代碼一次以這樣的：

def my_urlopen(url): 
    try: 
     return urllib2.urlopen(url) 
    except URLError: 
     urllib2.urlopen("https://example.com/log_error/?url="+url) 
     return None 

def lambda_handler(event, context): 

    urls = [ 
     "https://example.com/gives200.php", 
     "https://example.com/alsogives200.php", 
     "https://example.com/gives500.php" 
    ]; 

    results = pool.map(urllib2.urlopen, urls); 

    return urls;

來源

2017-04-26 Bing

我不確定異常對象是否會給你u有關失敗的URL的詳細信息。如果不是，則需要將每個電話打包爲urllib2.urlopen(url)，並使用try和catch。你可以是這樣做的：

urls = [ 
    "https://example.com/gives200.php", 
    "https://example.com/alsogives200.php", 
    "https://example.com/gives500.php" 
] 

def my_urlopen(url): 
    try: 
     return urllib2.urlopen(url) 
    except URLError: 
     urllib2.urlopen("https://example.com/log_error/?url="+url) 
     return None 

results = pool.map(my_urlopen, urls) 
# At this point, any failed requests will have None as their value

來源

2017-04-26 21:00:18

from multiprocessing import Process, Pool 
import urllib2 

# Asynchronous request 
def async_reqest(url): 
    try: 
     request = urllib2.Request(url) 
     response = urllib2.urlopen(request) 
     print response.info() 
    except: 
     pass 

pool = Pool() 
pool.map(async_reqest, links)

來源

2017-04-26 20:49:16

這是如何與pool.map協同工作的？ – Bing

編輯見更新以上3。需要與這個完全合併的mfripp's answer才能完全完成。

我更新了原來的帖子來解釋，但這正是我需要的代碼。我找不到任何文件，導致我e.url，這只是我的一個幸運的猜測。

urls = [ 
    "https://example.com/gives200.php", 
    "https://example.com/alsogives200.php", 
    "https://example.com/gives500.php" 
]; 

try: 
    results = pool.map(urllib2.urlopen, urls); 
except Exception as e: 
    urllib2.urlopen("https://example.com/log_error/?url="+str(e.url)+"&code="+str(e.code)+"&reason="+e.reason;

來源

2017-04-26 21:05:05 Bing

當'pool.map'遇到異常時，會引發該異常，然後終止所有其他任務。因此，通過這段代碼，您可能會發現某些網址從未嘗試過。如果你想嘗試每一個URL並記錄每一個產生錯誤的地方，你需要類似於其他兩個答案之一。 –

這是非常好的知道，謝謝！我會重寫我的「確切代碼」解決方案，然後接受您的答案。 – Bing

Python：如何使用urllib2和pool.map知道哪個URL失敗？

回答

相關問題