0
我想嘗試一些新的練習網絡報廢。我正嘗試在網站上登錄,然後刮取特定項目。Python的Scrapy:登錄到一個網站,然後刮
我已經爲此構建了此代碼,但它不起作用。我使用scrapy.FormRequest
登錄,用什麼我從文件讀取到目前爲止,我有以下代碼設置:
class HomelyspiderSpider(scrapy.Spider):
name = "homelyspider"
allowed_domains = ["homely.com.au"]
start_urls = ['https://homely.com.au/']
def parse(self, response):
yield scrapy.FormRequest.from_response(
response,
formxpath='.//div[@class="Modal-body"]/form',
formdata={
'usernameOrEmail': 'myusername',
'password': 'mypassword',
},
clickdata = { "type": "Submit" },
callback=self.after_login
)
def after_login(self, response):
"DO SCRAPING NOW"
登錄頁面HTML:
<div class="Auth Auth--modal">
<div class="signin ">
<div class="Modal-header">
<h1 class="Modal-title">Sign in</h1>
</div>
<div class="Modal-body">
<p class="subtitle">Instant sign in with Facebook or Google:</p><a class="Button Button--icon Button--facebook small-12" href="/authentication/redirect/Facebook"><span role="presentation" class="icon-wrapper"><svg class="icon icon-facebook"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#icon-facebook"></use></svg></span><span class="label">Continue with Facebook</span></a><a class="Button Button--icon Button--google small-12" href="/authentication/redirect/Google"><span role="presentation" class="icon-wrapper"><svg class="icon icon-google"><use xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="#icon-google"></use></svg></span><span class="label">Continue with Google</span></a>
<p>or using your email:</p>
<form>
<label class="">
<input type="text" aria-label="Email or Username" required="" pattern="^[^-\s].+" title="Please enter a valid value" name="usernameOrEmail" placeholder="Email or Username" class="FormControl" value="">
</label>
<label class="">
<input type="password" aria-label="Password" required="" pattern="^[^-\s].+" title="Please enter a valid value" name="password" placeholder="Password" class="FormControl">
</label>
<button class="Button Button--alt small-12" type="submit"><span class="Button-message">Sign In</span>
</button>
</form>
<p class="forgotten">
<button class="ButtonLink">Forgot Password?</button>
</p>
</div>
<div class="Modal-line"></div>
<div class="Modal-footer">
<p>
<!-- react-text: 71 -->Not yet a member?
<!-- /react-text -->
<button class="ButtonLink">Register with Homely</button>
</p>
</div>
</div>
</div>
我知道這是無關緊要的因爲from是在頁面中,但我仍然在顯示提供鏈接的步驟和元素。
這是主頁,我必須點擊登錄:
然後是登錄彈出包含表單代碼,我先前已提供:
我在這裏做錯了什麼?從我所瞭解的scrapy DOCs,我的scrapy表單請求代碼應該工作,對吧?
我可以看到,太..你能告訴爲什麼?形式xpath是好的 –
不,因爲我也得到錯誤,當使用XPath不知道爲什麼 – minime
我現在看到問題的形式不顯示,直到我點擊登錄按鈕 –