0
我試圖從上市,只能通過點擊「視圖」按鈕來觸發此表單提交查看詳細信息頁面的內容拼湊而成。我是新來的Python和Scrapy使用Scrapy刮表格後提交數據
示例標記
<li><h3>Abc Widgets</h3>
<form action="/viewlisting?id=123" method="post">
<input type="image" src="/images/view.png" value="submit" >
</form>
</li>
我的Scrapy的解決方案是提取表單操作,然後使用請求與回調返回頁面解析它爲想要的內容。不過,我已經打了幾個問題
我得到以下錯誤「請求的URL必須是海峽或Unicode」
其次,當我硬編碼的URL來克服上述問題,看來我的解析函數返回什麼看起來像一個列表
這裏是我的代碼 - 與真實的URL的反應
from scrapy.spiders import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from wfi2.items import Wfi2Item
class ProfileSpider(Spider):
name = "profiles"
allowed_domains = ["wfi.com.au"]
start_urls = ["http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=WA",
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=VIC",
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=QLD",
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=NSW",
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=TAS"
"http://example.com/wps/wcm/connect/internet/wfi/Contact+Us/Find+Your+Local+Office/findYourLocalOffice.jsp?state=NT"
]
def parse(self, response):
hxs = Selector(response)
forms = hxs.xpath('//*[@id="area-managers"]//*/form')
for form in forms:
action = form.xpath('@action').extract()
print "ACTION: ", action
#request = Request(url=action,callback=self.parse_profile)
request = Request(url=action,callback=self.parse_profile)
yield request
def parse_profile(self, response):
hxs = Selector(response)
profile = hxs.xpath('//*[@class="contentContainer"]/*/text()')
print "PROFILE", profile
感謝您的明確的解釋,並調用了文檔的相關章節 – htmlr