2017-05-25 102 views
0

我想嘗試一些新的練習網絡報廢。我正嘗試在網站上登錄,然後刮取特定項目。Python的Scrapy:登錄到一個網站,然後刮

我已經爲此構建了此代碼,但它不起作用。我使用scrapy.FormRequest登錄,用什麼我從文件讀取到目前爲止,我有以下代碼設置:

class HomelyspiderSpider(scrapy.Spider): 
    name = "homelyspider" 
    allowed_domains = ["homely.com.au"] 
    start_urls = ['https://homely.com.au/'] 

    def parse(self, response): 

      yield scrapy.FormRequest.from_response(
       response, 
       formxpath='.//div[@class="Modal-body"]/form', 
       formdata={ 
        'usernameOrEmail': 'myusername',    
        'password': 'mypassword', 
       }, 
       clickdata = { "type": "Submit" }, 
       callback=self.after_login 
      ) 
    def after_login(self, response): 
      "DO SCRAPING NOW" 

登錄頁面HTML

<div class="Auth Auth--modal"> 
    <div class="signin "> 
     <div class="Modal-header"> 
      <h1 class="Modal-title">Sign in</h1> 
     </div> 
     <div class="Modal-body"> 
      <p class="subtitle">Instant sign in with Facebook or Google:</p><a class="Button Button--icon Button--facebook small-12" href="/authentication/redirect/Facebook"><span role="presentation" class="icon-wrapper"><svg class="icon icon-facebook"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#icon-facebook"></use></svg></span><span class="label">Continue with Facebook</span></a><a class="Button Button--icon Button--google small-12" href="/authentication/redirect/Google"><span role="presentation" class="icon-wrapper"><svg class="icon icon-google"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#icon-google"></use></svg></span><span class="label">Continue with Google</span></a> 
      <p>or using your email:</p> 
      <form> 
       <label class=""> 
        <input type="text" aria-label="Email or Username" required="" pattern="^[^-\s].+" title="Please enter a valid value" name="usernameOrEmail" placeholder="Email or Username" class="FormControl" value=""> 
       </label> 
       <label class=""> 
        <input type="password" aria-label="Password" required="" pattern="^[^-\s].+" title="Please enter a valid value" name="password" placeholder="Password" class="FormControl"> 
       </label> 
       <button class="Button Button--alt small-12" type="submit"><span class="Button-message">Sign In</span> 
       </button> 
      </form> 
      <p class="forgotten"> 
       <button class="ButtonLink">Forgot Password?</button> 
      </p> 
     </div> 
     <div class="Modal-line"></div> 
     <div class="Modal-footer"> 
      <p> 
       <!-- react-text: 71 -->Not yet a member? 
       <!-- /react-text --> 
       <button class="ButtonLink">Register with Homely</button> 
      </p> 
     </div> 
    </div> 
</div> 

我知道這是無關緊要的因爲from是在頁面中,但我仍然在顯示提供鏈接的步驟和元素。

這是主頁,我必須點擊登錄

enter image description here

enter image description here

然後是登錄彈出包含表單代碼,我先前已提供:

enter image description here

我在這裏做錯了什麼?從我所瞭解的scrapy DOCs,我的scrapy表單請求代碼應該工作,對吧?

回答

0

ValueError異常:未在> 它沒有找到的形式找到的元素...

+0

我可以看到,太..你能告訴爲什麼?形式xpath是好的 –

+0

不,因爲我也得到錯誤,當使用XPath不知道爲什麼 – minime

+0

我現在看到問題的形式不顯示,直到我點擊登錄按鈕 –