我CrawlSpider:Scrapy不產生鏈接正確
class FabulousFoxSpider(CrawlSpider):
"""docstring for EventsSpider"""
name="fabulousfox"
allowed_domains=["fabulousfox.com"]
start_urls=["http://www.fabulousfox.com"]
rules = (
Rule(SgmlLinkExtractor(
allow=(
'/shows_page_(single|multi).aspx\?usID=(\d)*'
),
unique=True),
'parse_fabulousfox',
),
)
但是當我做scrapy crawl fabulousfox -o data.json -t json
我得到的輸出:
...................
......................
2014-03-01 13:11:56+0530 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2014-03-01 13:11:56+0530 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2014-03-01 13:11:57+0530 [fabulousfox] DEBUG: Crawled (200) <GET http://www.fabulousfox.com> (referer: None)
2014-03-01 13:11:57+0530 [fabulousfox] DEBUG: Crawled (403) <GET http://www.fabulousfox.com/../shows_page_multi.aspx?usID=365> (referer: http://www.fabulousfox.com)
2014-03-01 13:11:58+0530 [fabulousfox] DEBUG: Crawled (403) <GET http://www.fabulousfox.com/../shows_page_single.aspx?usID=389> (referer: http://www.fabulousfox.com)
2014-03-01 13:11:58+0530 [fabulousfox] DEBUG: Crawled (403) <GET http://www.fabulousfox.com/../shows_page_multi.aspx?usID=388> (referer: http://www.fabulousfox.com)
2014-03-01 13:11:58+0530 [fabulousfox] DEBUG: Crawled (403) <GET http://www.fabulousfox.com/../shows_page_single.aspx?usID=394> (referer: http://www.fabulousfox.com)
2014-03-01 13:11:58+0530 [fabulousfox] DEBUG: Crawled (403) <GET http://www.fabulousfox.com/../shows_page_multi.aspx?usID=358> (referer: http://www.fabulousfox.com)
2014-03-01 13:11:58+0530 [fabulousfox] INFO: Closing spider (finished)
2014-03-01 13:11:58+0530 [fabulousfox] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 1660,
'downloader/request_count': 6,
'downloader/request_method_count/GET': 6,
'downloader/response_bytes': 12840,
'downloader/response_count': 6,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/403': 5,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2014, 3, 1, 7, 41, 58, 218296),
'log_count/DEBUG': 8,
'log_count/INFO': 7,
'memdebug/gc_garbage_count': 0,
'memdebug/live_refs/FabulousFoxSpider': 1,
'memusage/max': 33275904,
'memusage/startup': 33275904,
'request_depth_max': 1,
'response_received_count': 6,
'scheduler/dequeued': 6,
'scheduler/dequeued/memory': 6,
'scheduler/enqueued': 6,
'scheduler/enqueued/memory': 6,
'start_time': datetime.datetime(2014, 3, 1, 7, 41, 56, 360266)}
2014-03-01 13:11:58+0530 [fabulousfox] INFO: Spider closed (finished)
爲什麼的生成的URL包含...
http://www.fabulousfox.com/../shows_page_multi.aspx?usID=365
另外它不會生成所有的網址。這裏有什麼問題?
我遇到Scrapy'的'新版本相同的問題。 – 2014-03-01 07:48:41
我懷疑它與Scrapy版本有什麼關係。 – mrudult
[Python Scrapy:將相對路徑轉換爲絕對路徑]可能的重複(http://stackoverflow.com/questions/6499603/python-scrapy-convert-relative-paths-to-absolute-paths) –