2016-01-24 74 views
3
import asyncio 
import aiohttp 
import bs4 
import tqdm 


@asyncio.coroutine 
def get(*args, **kwargs): 
    response = yield from aiohttp.request('GET', *args, **kwargs) 
    return (yield from response.read_and_close(decode=True)) 


@asyncio.coroutine 
def wait_with_progress(coros): 
    for f in tqdm.tqdm(asyncio.as_completed(coros), total=len(coros)): 
     yield from f 


def first_magnet(page): 
    soup = bs4.BeautifulSoup(page) 
    a = soup.find('a', title='Download this torrent using magnet') 
    return a['href'] 


@asyncio.coroutine 
def print_magnet(query): 
    url = 'http://thepiratebay.se/search/{}/0/7/0'.format(query) 
    with (yield from sem): 
     page = yield from get(url, compress=True) 
    magnet = first_magnet(page) 
    print('{}: {}'.format(query, magnet)) 

distros = ['archlinux', 'ubuntu', 'debian'] 
sem = asyncio.Semaphore(5) 
loop = asyncio.get_event_loop() 
f = asyncio.wait([print_magnet(d) for d in distros]) 
loop.run_until_complete(f) 

什麼都沒有返回。下面的錯誤是:蟒蛇爬蟲不適用於asyncio

C:\Python34\python.exe C:/Users/Marco/PycharmProjects/untitled3/crawler.py 
Unclosed connection 
client_connection: Connection<('thepiratebay.se', 443, True)> 
Unclosed response 
client_response: <ClientResponse(https://thepiratebay.se/search/archlinux/0/7/0) [200 OK]> 
<CIMultiDictProxy('SERVER': 'cloudflare-nginx', 'DATE': 'Sun, 24 Jan 2016 03:17:30 GMT', 'CONTENT-TYPE': 'text/html; charset=UTF-8', 'TRANSFER-ENCODING': 'chunked', 'CONNECTION': 'keep-alive', 'SET-COOKIE': 'PHPSESSID=72fd62ba4a13c716576868e13d00a3ae; path=/; domain=.thepiratebay.se', 'EXPIRES': 'Thu, 19 Nov 1981 08:52:00 GMT', 'CACHE-CONTROL': 'private, max-age=10800, pre-check=10800', 'LAST-MODIFIED': 'Sun, 15 Mar 2015 05:20:08 GMT', 'SET-COOKIE': 'language=en_EN; expires=Mon, 23-Jan-2017 03:17:32 GMT; path=/; domain=.thepiratebay.se', 'VARY': 'Accept-Encoding', 'CF-RAY': '269895ee698e32dd-HKG', 'CONTENT-ENCODING': 'gzip')> 

Unclosed connection 
client_connection: Connection<('thepiratebay.se', 443, True)> 
Unclosed response 
client_response: <ClientResponse(https://thepiratebay.se/search/debian/0/7/0) [200 OK]> 
<CIMultiDictProxy('SERVER': 'cloudflare-nginx', 'DATE': 'Sun, 24 Jan 2016 03:17:30 GMT', 'CONTENT-TYPE': 'text/html; charset=UTF-8', 'TRANSFER-ENCODING': 'chunked', 'CONNECTION': 'keep-alive', 'SET-COOKIE': 'PHPSESSID=52751957860238a12a8bff265f19a3b8; path=/; domain=.thepiratebay.se', 'EXPIRES': 'Thu, 19 Nov 1981 08:52:00 GMT', 'CACHE-CONTROL': 'private, max-age=10800, pre-check=10800', 'LAST-MODIFIED': 'Sun, 15 Mar 2015 05:20:08 GMT', 'SET-COOKIE': 'language=en_EN; expires=Mon, 23-Jan-2017 03:17:31 GMT; path=/; domain=.thepiratebay.se', 'VARY': 'Accept-Encoding', 'CF-RAY': '269895ee61921944-HKG', 'CONTENT-ENCODING': 'gzip')> 

Unclosed connection 
client_connection: Connection<('thepiratebay.se', 443, True)> 
Unclosed response 
client_response: <ClientResponse(https://thepiratebay.se/search/ubuntu/0/7/0) [200 OK]> 
<CIMultiDictProxy('SERVER': 'cloudflare-nginx', 'DATE': 'Sun, 24 Jan 2016 03:17:30 GMT', 'CONTENT-TYPE': 'text/html; charset=UTF-8', 'TRANSFER-ENCODING': 'chunked', 'CONNECTION': 'keep-alive', 'SET-COOKIE': 'PHPSESSID=1227bf9b240e1d057ea80b2605724913; path=/; domain=.thepiratebay.se', 'EXPIRES': 'Thu, 19 Nov 1981 08:52:00 GMT', 'CACHE-CONTROL': 'private, max-age=10800, pre-check=10800', 'LAST-MODIFIED': 'Sun, 15 Mar 2015 05:20:08 GMT', 'SET-COOKIE': 'language=en_EN; expires=Mon, 23-Jan-2017 03:17:32 GMT; path=/; domain=.thepiratebay.se', 'VARY': 'Accept-Encoding', 'CF-RAY': '269895ee7ae31944-HKG', 'CONTENT-ENCODING': 'gzip')> 

Task exception was never retrieved 
future: <Task finished coro=<print_magnet() done, defined at C:/Users/Marco/PycharmProjects/untitled3/crawler.py:25> exception=AttributeError("'ClientResponse' object has no attribute 'read_and_close'",)> 
Traceback (most recent call last): 
    File "C:\Python34\lib\asyncio\tasks.py", line 236, in _step 
    result = coro.send(value) 
    File "C:/Users/Marco/PycharmProjects/untitled3/crawler.py", line 29, in print_magnet 
    page = yield from get(url, compress=True) 
    File "C:/Users/Marco/PycharmProjects/untitled3/crawler.py", line 10, in get 
    return (yield from response.read_and_close(decode=True)) 
AttributeError: 'ClientResponse' object has no attribute 'read_and_close' 
Task exception was never retrieved 
future: <Task finished coro=<print_magnet() done, defined at C:/Users/Marco/PycharmProjects/untitled3/crawler.py:25> exception=AttributeError("'ClientResponse' object has no attribute 'read_and_close'",)> 
Traceback (most recent call last): 
    File "C:\Python34\lib\asyncio\tasks.py", line 236, in _step 
    result = coro.send(value) 
    File "C:/Users/Marco/PycharmProjects/untitled3/crawler.py", line 29, in print_magnet 
    page = yield from get(url, compress=True) 
    File "C:/Users/Marco/PycharmProjects/untitled3/crawler.py", line 10, in get 
    return (yield from response.read_and_close(decode=True)) 
AttributeError: 'ClientResponse' object has no attribute 'read_and_close' 
Task exception was never retrieved 
future: <Task finished coro=<print_magnet() done, defined at C:/Users/Marco/PycharmProjects/untitled3/crawler.py:25> exception=AttributeError("'ClientResponse' object has no attribute 'read_and_close'",)> 
Traceback (most recent call last): 
    File "C:\Python34\lib\asyncio\tasks.py", line 236, in _step 
    result = coro.send(value) 
    File "C:/Users/Marco/PycharmProjects/untitled3/crawler.py", line 29, in print_magnet 
    page = yield from get(url, compress=True) 
    File "C:/Users/Marco/PycharmProjects/untitled3/crawler.py", line 10, in get 
    return (yield from response.read_and_close(decode=True)) 
AttributeError: 'ClientResponse' object has no attribute 'read_and_close' 
Exception ignored in: Exception ignored in: Exception ignored in: 

回答

6

的aiohttp方法ClientResponse.read_and_close()走了截至12月到2015年的您可以在changelog找到。基於對readthedocs site我認爲這將是好的改變行給出的例子:

return (yield from response.read_and_close(decode=True))

return (yield from response.text())

的readthedocs頁面有很好的例子,從只要工作你要記住,因爲你使用Python 3.4,語法會有點不同。而不是awaitasync def使用yield from@couroutine裝飾者,你應該沒問題。

+0

非常感謝TrevorM – user3035661