2017-10-07 70 views
0

我需要你的幫助,我有一個網站,我必須從這個網站獲得信息。該網站的例子:Image HTML enter image description here尋找並從網站獲取元素與硒

我必須從classinputField獲取數據,但我必須對數據進行排序,例如:如果classkeyType of Work我們從classinputFieldvar1寫入數據,如果classkeyApplication No.我們將數據從classinputField寫入var2,如果classkeyDate Lodged我們將數據從classinputField寫入var3。 代碼:

import scrapy 
    from tasks.items import TasksItem 
    from selenium import webdriver 
    from selenium.webdriver.common.by import By 


    class MySpider(scrapy.Spider): 
     title = [] 
     type = [] 
     name = 'Spider' 
     allowed_domains = ['https://ecouncil.bayside.vic.gov.au/'] 

     driver = webdriver.Chrome('C:/TEMP/Scrapy/chromedriver') 

     driver.get('https://ecouncil.bayside.vic.gov.au/eservice/daEnquiryInit.do?docType=5&nodeNum=1118') 
     driver.get('https://ecouncil.bayside.vic.gov.au/eservice/daEnquiry.do?number=&lodgeRangeType=on&dateFrom=01%2F09%2F2017&dateTo=30%2F09%2F2017&detDateFromString=&detDateToString=&streetName=&suburb=0&unitNum=&houseNum=0%0D%0A%09%09%09%09%09&planNumber=&strataPlan=&lotNumber=&propertyName=&searchMode=A&submitButton=Search') 

     title = driver.find_elements_by_css_selector('a.plain_header') 
     type = driver.find_elements_by_css_selector('p.rowDataOnly') 
     for i in type: 
      t1 = i.find_element_by_class_name('key').text 
      if t1 == 'Type of Work': 
       var1 = t1 
      elif t1 == 'some_text': 
       var2 = t1 
      else: 
       var3 = t1 

但我不知道我山楂可以從inputField

+0

請閱讀爲什麼[代碼截圖是一個壞主意](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-orrors)。代碼粘貼並正確格式化。 – JeffC

回答

0

您目前的邏輯不會奏效。你想要做的是獲得一些屬性的數量,然後循環遍歷每一個。當你遍歷每一個,你抓住你感興趣的三個項目,並將它們存儲在三個變量(你真的應該使用更多的描述性名稱,順便說一句)。

像下面的東西應該做的。

class MySpider(scrapy.Spider): 
    title = [] 
    type = [] 
    name = 'Spider' 
    allowed_domains = ['https://ecouncil.bayside.vic.gov.au/'] 

    driver = webdriver.Chrome('C:/TEMP/Scrapy/chromedriver') 

    driver.get('https://ecouncil.bayside.vic.gov.au/eservice/daEnquiryInit.do?docType=5&nodeNum=1118') 
    driver.get('https://ecouncil.bayside.vic.gov.au/eservice/daEnquiry.do?number=&lodgeRangeType=on&dateFrom=01%2F09%2F2017&dateTo=30%2F09%2F2017&detDateFromString=&detDateToString=&streetName=&suburb=0&unitNum=&houseNum=0%0D%0A%09%09%09%09%09&planNumber=&strataPlan=&lotNumber=&propertyName=&searchMode=A&submitButton=Search') 

    titles = driver.find_elements_by_css_selector('a.plain_header') 
    for i in range(0, len(titles) - 1): 
     var1 = driver.find_elements_by_xpath("//span[@class='key'][.='Type of Work']/following-sibling::span[@class='inputField']")[i].text 
     var2 = driver.find_elements_by_xpath("//span[@class='key'][.='Application No.']/following-sibling::span[@class='inputField']")[i].text 
     var3 = driver.find_elements_by_xpath("//span[@class='key'][.='Date Lodged']/following-sibling::span[@class='inputField']")[i].text 

爲了使這更容易維護(讀),你可以把代碼中的最後三行,並把它到您的字段名傳遞一個函數,例如Date Lodged,並返回字段值,例如2017年1月9日。我會把它作爲你的練習。

+0

對不起,但我得到'爲我在範圍內(0,titles.count-1):' 'TypeError:不支持的操作數類型爲 - :'builtin_function_or_method'和'int'' –

+0

我正在使用'.count '而不是'len()'......錯誤的語言。立即嘗試 – JeffC

+0

工作!謝謝! –

-1

我在Java中試圖獲取數據。你可以在Python中使用相同的方法。

您可以使用class = keyclass = inputField獲取所有span元素。 迭代這些並獲得有興趣的信息。

public static void main(String args[]){ 
     WebDriver driver = new FirefoxDriver(); 

     driver.get("https://ecouncil.bayside.vic.gov.au/eservice/daEnquiryInit.do?docType=5&nodeNum=1118"); 
     driver.get("https://ecouncil.bayside.vic.gov.au/eservice/daEnquiry.do?number=&lodgeRangeType=on&dateFrom=01%2F09%2F2017&dateTo=30%2F09%2F2017&detDateFromString=&detDateToString=&streetName=&suburb=0&unitNum=&houseNum=0%0D%0A%09%09%09%09%09&planNumber=&strataPlan=&lotNumber=&propertyName=&searchMode=A&submitButton=Search"); 

     List<WebElement> keys = driver.findElements(By.xpath("//span[@class='key']")); 
     List<WebElement> inputFields = driver.findElements(By.xpath("//span[@class='inputField']")); 

     String var1, var2, var3; 

     for (int j = 0; j < keys.size(); j++) { 
      WebElement key = keys.get(j); 
      System.out.println("key: " + key.getText()); 
      System.out.println("inputField: " + inputFields.get(j).getText()); 
      if (key.getText().equalsIgnoreCase("Type of Work")) { 
       var1 = inputFields.get(j).getText(); 
       System.out.println("var1: " + var1); 
      } else if (key.getText().equalsIgnoreCase("Application No.")) { 
       var2 = inputFields.get(j).getText(); 
       System.out.println("var2: " + var2); 
      } else if (key.getText().equalsIgnoreCase("Date Lodged")) { 
       var3 = inputFields.get(j).getText(); 
       System.out.println("var3: " + var3); 
      } 

      System.out.println("------------------" + j + "------------------"); 

     } 

    }