我遇到了Scrapy的問題,出於某種原因,它沒有進入我的解析方法,我不知道爲什麼會這樣。我嘗試過不同的選擇,但沒有成功。Scrapy沒有進入解析方法
這就是我的代碼現在的樣子。具體來說,有兩個打印語句,並且parse()方法中的一個未被調用。
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from scrapy import log
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from comments.items import CustomerReview
import re
class AppidSpider(BaseSpider):
name = "appid"
allowed_domains = ["itunes.apple.com"]
start_urls = [
"http://itunes.apple.com/us/genre/ios/id36?mt=8"
]
rules = [Rule(SgmlLinkExtractor(), follow=True, callback='parse')]
print "---> THIS IS TEST 1"
def parse(self, response):
print " ----> THIS IS TEST 2"
# ... More code afterwards
而這是輸出。正如你可以看到測試2從不打印。
$ scrapy crawl appid
2012-07-05 13:41:02+0000 [scrapy] INFO: Scrapy 0.14.4 started (bot: comments)
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, MemoryUsage, SpiderState
---> THIS IS TEST 1
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Enabled item pipelines: FilterWordsPipeline
2012-07-05 13:41:02+0000 [appid] INFO: Spider opened
2012-07-05 13:41:02+0000 [appid] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2012-07-05 13:41:02+0000 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2012-07-05 13:41:02+0000 [appid] DEBUG: Crawled (200) <GET http://itunes.apple.com/us/genre/ios/id36?mt=8> (referer: None)
2012-07-05 13:41:02+0000 [appid] INFO: Closing spider (finished)
2012-07-05 13:41:02+0000 [appid] INFO: Dumping spider stats:
{'downloader/request_bytes': 222,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 9927,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2012, 7, 5, 13, 41, 2, 694678),
'scheduler/memory_enqueued': 1,
'start_time': datetime.datetime(2012, 7, 5, 13, 41, 2, 604025)}
2012-07-05 13:41:02+0000 [appid] INFO: Spider closed (finished)
2012-07-05 13:41:02+0000 [scrapy] INFO: Dumping global stats:
{'memusage/max': 95318016, 'memusage/startup': 95318016}
如果您使用規則,您應該繼承CrawlSpider而不是BaseSpider,併爲解析方法選擇不同的名稱(不是'parse')。 – 2012-07-06 01:39:39
謝謝,這可以解釋我現在面臨的第二個問題。 – 2012-07-06 18:33:26