Scrapy管道不插入到MySQL

我正在嘗試使用Scrapy爲大學項目構建一個小應用程序。蜘蛛抓取的項目，但我的管道沒有插入數據到MySQL數據庫。爲了測試管道是否不工作或pymysl執行不工作我寫了一個測試腳本：Scrapy管道不插入到MySQL

代碼開始

#!/usr/bin/python3 

import pymysql 

str1 = "hey" 
str2 = "there" 
str3 = "little" 
str4 = "script" 

db = pymysql.connect("localhost","root","**********","stromtarife") 

cursor = db.cursor() 

cursor.execute("SELECT * FROM vattenfall") 
cursor.execute("INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)", (str1, str2, str3, str4)) 
cursor.execute("SELECT * FROM vattenfall") 
data = cursor.fetchone() 
print(data) 
db.commit() 
cursor.close() 

db.close()

代碼結束

我運行之後腳本我的數據庫有一個新的記錄，所以它不是我的pymysql.connect（）函數，這是破產。

我會提供我scrapy代碼：

vattenfall_form.py

# -*- coding: utf-8 -*- 
import scrapy 
from scrapy.crawler import CrawlerProcess 
from stromtarife.items import StromtarifeItem 

from scrapy.http import FormRequest 

class VattenfallEasy24KemptenV1500Spider(scrapy.Spider): 
    name = 'vattenfall-easy24-v1500-p87435' 

    def start_requests(self): 
     return [ 
      FormRequest(
       "https://www.vattenfall.de/de/stromtarife.htm", 
       formdata={"place": "87435", "zipCode": "87435", "cityName": "Kempten", 
         "electricity_consumptionprivate": "1500", "street": "", "hno": ""}, 
      callback=self.parse 
     ), 
    ] 

    def parse(self, response): 
     item = StromtarifeItem() 
     item['jahrespreis'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[3]/td[2]/text()').extract_first() 
     item['treuebonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[2]/td/strong/text()').extract_first() 
     item['sofortbonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[1]/td/strong/text()').extract_first() 
     item['tarif'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[1]/h2/span/text()').extract_first() 
     yield item 



class VattenfallEasy24KemptenV2500Spider(scrapy.Spider): 
    name = 'vattenfall-easy24-v2500-p87435' 

    def start_requests(self): 
     return [ 
        FormRequest(
        "https://www.vattenfall.de/de/stromtarife.htm", 
        formdata={"place": "87435", "zipCode": "87435", "cityName": "Kempten", 
           "electricity_consumptionprivate": "2500", "street": "", "hno": ""}, 
        callback=self.parse 
       ), 
    ] 

    def parse(self, response): 
     item = StromtarifeItem() 
     item['jahrespreis'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[3]/td[2]/text()').extract_first() 
     item['treuebonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[2]/td/strong/text()').extract_first() 
     item['sofortbonus'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[2]/form[1]/div/div[2]/table/tbody/tr[1]/td/strong/text()').extract_first() 
     item['tarif'] = response.xpath('/html/body/main/div[1]/div[2]/div/div[3]/div[2]/div/div[1]/h2/span/text()').extract_first() 
     yield item 



process = CrawlerProcess() 
process.crawl(VattenfallEasy24KemptenV1500Spider) 
process.crawl(VattenfallEasy24KemptenV2500Spider) 
process.start()

pipelines.py

import pymysql 
from stromtarife.items import StromtarifeItem 


class StromtarifePipeline(object): 
    def __init__(self): 
     self.connection = pymysql.connect("localhost","root","**********","stromtarife") 
     self.cursor = self.connection.cursor() 


    def process_item(self, item, spider): 
     self.cursor.execute("INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)", (item['tarif'], item['sofortbonus'], item['treuebonus'], item['jahrespreis'])) 
     self.connection.commit() 
     self.cursor.close() 
     self.connection.close()

settings.py（我改變了只有一行）

ITEM_PIPELINES = { 
    'stromtarife.pipelines.StromtarifePipeline': 300, 
}

那麼，什麼是錯我的代碼？我無法弄清楚，如果有人看到我錯過的東西，我會很開心。提前致謝！

來源

2017-04-27 tolgaIsThere

每次處理項目時都不應關閉pymsql連接。

你應該寫的close_spider功能在您的管道這樣的，所以在連接關閉只有一次，在執行結束：

def close_spider(self, spider): 
     self.cursor.close() 
     self.connection.close()

而且你neeed到的process_item末退回商品

您的文件pipeline.py應該是這樣的：

import pymysql 
from stromtarife.items import StromtarifeItem 


class StromtarifePipeline(object): 
    def __init__(self): 
     self.connection = pymysql.connect("localhost","root","**********","stromtarife") 
     self.cursor = self.connection.cursor() 


    def process_item(self, item, spider): 
     self.cursor.execute("INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)", (item['tarif'], item['sofortbonus'], item['treuebonus'], item['jahrespreis'])) 
     self.connection.commit() 
     return item 

    def close_spider(self, spider): 
     self.cursor.close() 
     self.connection.close()

UPDATE：

我想你的代碼，問題是出在醞釀中，有兩個問題：

您嘗試索引歐元符號€，我覺得MySQL不喜歡它。
您的查詢字符串構建不當。

我設法把事情通過書面方式將管道這樣做：

def process_item(self, item, spider): 
    query = """INSERT INTO vattenfall (tarif, sofortbonus, treuebonus, jahrespreis) VALUES (%s, %s, %s, %s)""" % ("1", "2", "3", "4") 
    self.cursor.execute(query) 
    self.connection.commit() 
    return item

我的東西，你應該從你嘗試插入價格取出€。

希望這有助於，讓我知道。

來源

2017-04-27 11:28:06

謝謝你的回答。我改變了process_item（）並添加了close_spider（），但我仍然沒有得到任何東西到我的數據庫中。如果我得到結果，我可以進行下一步並遵循rrschmidt的建議。我真的不知道我的代碼有什麼問題.. – tolgaIsThere

我在cursor.execute（）函數中的process_item（）中替換了部分： ...％s，％s）「，（item ['tarif']， item ['sofortbonus']，item ['treuebonus']，item ['jahrespreis']）） with strings： ..％s，％s）「，（」hey「，」how「，」are「「你」））而且它仍然不工作.. – tolgaIsThere

我更新了以前的答案，讓我知道如果這適合你 –

除了SQL管道在寫完第一個項目之後關閉SQL連接（如Adrien指出的）之外，還存在另一個問題。

另一個問題是：您的刮板只能爲每個結果頁面（並且只訪問一個結果頁面）的單個項目刮取一個。我檢查了Vattenfall，通常會顯示多個結果，我想你想把它們都刮掉。

意味着你還必須遍歷頁面上的結果，並創建多個項目，而這樣做。這裏的scrapy教程給出了一個很好的解釋：https://doc.scrapy.org/en/latest/intro/tutorial.html#extracting-quotes-and-authors

來源

2017-04-27 13:05:31 rrschmidt

Scrapy管道不插入到MySQL

回答

相關問題