0
我必須執行爬蟲並將數據放入數據庫中。 我已經收集了我的數據,但我有問題將它們放入數據庫中。無法將scrapy連接到我的數據庫
我的文件有:
topcrawlerspider.py(我的履帶,他是fonctional):
from scrapy import Spider, Item, Field, Request
from ..items import TopcrawlerItem
from ..pipelines import TopcrawlerPipeline
import time
class TopSpider(Spider):
name = 'topcrawler'
start_urls = ['...']
def __init__(self, page=0, *args, **kwargs):
super(TopSpider, self).__init__(*args, **kwargs)
self.search_result_url_tpl = 'http://.../%s'
...
settings.py:
BOT_NAME = 'topcrawler'
SPIDER_MODULES = ['topcrawler.spiders']
NEWSPIDER_MODULE = 'topcrawler.spiders'
# Crawl responsibly by identifying yourself (and your website) on the
user-agent
#USER_AGENT = 'topcrawler (+http://www.yourdomain.com)'
# Obey robots.txt rules
ROBOTSTXT_OBEY = True
ITEM_PIPELINES = {
'topcrawler.pipelines.TopcrawlerPipeline': 300,
# 'topcrawler.pipelines.JsonWriterPipeline': 800,
}
MONGODB_URI = 'mongodb://root:[email protected]:8889/mtdbdd'
MONGO_DATABASE = 'mtdbdd'
pipelines.py:
import pymongo
from settings import *
class TopcrawlerPipeline(object):
collection_name = 'land'
def __init__(self, mongo_uri, mongo_db):
self.mongo_uri = mongo_uri
self.mongo_db = mongo_db
@classmethod
def from_crawler(cls, crawler):
return cls(
mongo_uri=crawler.settings.get('MONGO_URI'),
mongo_db=crawler.settings.get('MONGO_DATABASE', 'items')
)
def open_spider(self, spider):
self.client = pymongo.MongoClient(self.mongo_uri)
self.db = self.client[self.mongo_db]
def close_spider(self, spider):
self.client.close()
def process_item(self, item, spider):
self.db[self.collection_name].insert(dict(item))
return item
我有t他錯誤:
ServerSelectionTimeoutError: localhost:27017: [Errno 8] nodename nor servname provided, or not known
它似乎它沒有連接到端口8889像我想要的,但我不undertand爲什麼...
感謝茜幫助!
嗨!謝謝你的回答。 我編輯我的文件pipeline.py(就像我在後編輯),但我又有同樣的錯誤:/ –
在你的'pipelinelines.py'你使用'MONGO_URI',但在'settings.py'你定義'MONGODB_URI '。這可能是錯誤的根源。請檢查出來。 –