2016-04-03 44 views
1

我正在使用Python 3,MySQL,Sequel Pro和BeautifulSoup。使用抓取的數據填充MySQL表格

簡而言之,我想創建一個SQL表,然後將我下載的數據插入到該數據中。

我已經使用此答案作爲模板來構建SQL部分Beautiful soup webscrape into mysql,但它不起作用。

錯誤拋出:

line 86 finally:SyntaxError: invalid syntax 

當我註釋掉這最後finally:(只是看代碼的其他工作),我得到:

InternalError: (1054, "Unknown column 'address' in 'field list'") 

我有另一種常見的錯誤是:

ProgrammingError: (1146, "Table 'simple_scrape.simple3' doesn't exist", 雖然我不記得我所做的最終的錯誤的確切更改。

最後 - 我不到四周前就開始學習編程(不僅僅是Python,而是'編程') - 如果你想知道爲什麼我做了一些愚蠢或效率低下的事情,幾乎肯定是因爲這是第一種方式我得到它的工作! 請幫忙!

代碼:

from selenium import webdriver 
 

 
#Guess BER Number 
 
for i in range(108053983,108053985): 
 
    try:  
 
#  ber_try = 100000000 
 
     ber_try =+i 
 
#Open page & insert BER Number 
 
     browser = webdriver.Firefox() 
 
     type(browser) 
 
     browser.get('https://ndber.seai.ie/pass/ber/search.aspx') 
 
     ber_send = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_dfSearch_txtBERNumber') 
 
     ber_send.send_keys(ber_try) 
 
     
 
#click search 
 
     form = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_dfSearch_Bottomsearch') 
 
     form.click() 
 
     
 

 
#click intermediate page 
 
     form = browser.find_element_by_id('ctl00_DefaultContent_BERSearch_gridRatings_gridview_ctl02_ViewDetails') 
 
     form.click() 
 
       
 
#scrape the page 
 
     import bs4 
 
     
 
    
 
     
 
     
 
     soup = bs4.BeautifulSoup(browser.page_source) 
 
     
 
     
 
     # First Section 
 
     ber_dec = soup.find('fieldset', {'id':'ctl00_DefaultContent_BERSearch_fsBER'}) 
 
     
 
     
 
     address = ber_dec.find('div', {'id':'ctl00_DefaultContent_BERSearch_dfBER_div_PublishingAddress'}) 
 
     address = (address.get_text(', ').strip()) 
 
     print(address) 
 
     
 
     
 
     date_issue = ber_dec.find('span', {'id':'ctl00_DefaultContent_BERSearch_dfBER_container_DateOfIssue'}) 
 
     date_issue = date_issue.get_text().strip() 
 
     print(date_issue) 
 
     
 
    except: 
 
     print('Invalid BER Number:', ber_try) 
 
     browser.quit() 
 
    
 
     
 
    #connecting to mysql  
 

 
    
 
    finally: 
 
      import pymysql.cursors 
 
      from pymysql import connect, err, sys, cursors 
 
     
 
    #Making the connection 
 
      connection = pymysql.connect(host = '127.0.0.1', 
 
             port = 3306, 
 
             user = 'root', 
 
             passwd = 'root11', 
 
             db = 'simple_scrape', 
 
             cursorclass=pymysql.cursors.DictCursor); 
 

 
      with connection.cursor() as cursor: 
 
       sql= """CREATE TABLE `simple3`(
 
       (
 
       `ID` INT AUTO_INCREMENT NOT NULL, 
 
       `address` VARCHAR(200) NOT NULL, 
 
       `date_issue` VARCHAR(200) NOT NULL, 
 
       
 
       PRIMARY KEY (`ID`) 
 
      )Engine = MyISAM)""" 
 
     
 
       sql = "INSERT INTO `simple3` (`address`, `date_issue`) VALUES (%s, %s)" 
 
       cursor.execute(sql, (address, date_issue)) 
 
      connection.commit() 
 
    finally: 
 
      connection.close() 
 
    
 
    browser.quit() 
 
    

回答

1

問題: 而實際上創建表

  sql= """CREATE TABLE simple3(
      (
      ID INT AUTO_INCREMENT NOT NULL, 
      address VARCHAR(200) NOT NULL, 
      date_issue VARCHAR(200) NOT NULL, 

      PRIMARY KEY (ID) 
     )Engine = MyISAM)""" 
// Added this line since your table was not being created. 
      cursor.execute(sql) 

      sql = "INSERT INTO simple3 (address, date_issue) VALUES (%s, %s)" 
      cursor.execute(sql, (address, date_issue)) 
+0

非常感謝回去我,但是當我做(我複製並粘貼,以確保我沒有錯過任何東西)我得到以下錯誤:'行74 sql =「CREATE TABLE'simple3'( ^ SyntaxError:掃描字符串文字時的EOL' –

+1

刪除後引號(請參閱編輯版本)。除非您在表或列名稱中使用空格(不提倡),否則後面的引號對MySQL不是必需的。 –

+2

如果你在一行中分割一個字符串(即'''「''''''''''),可以使用三個引號。 – ChrisP