2016-12-14 101 views
0

我試圖使用機械化保存usautoforce的主頁。@ Ertugrul根據你的回答,我有完整的頁面。但是當我試圖訪問用戶名和密碼字段時,它給出了一個錯誤。我已經把所有的readonly設置爲false。當我在編輯器中打開的網頁沒有HTML指用戶名和密碼 這是我在機械化的代碼,無法使用機械化訪問完整的網頁

br = mechanize.Browser() 


br.set_handle_equiv(True) 
br.set_handle_redirect(True) 
br.set_handle_robots(False) 
#br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')] 
br.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0'), ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'),('Upgrade-Insecure-Requests','1'),('Connection','keep-alive')] 

br.open("http://www.usautoforce.com/Pages/home.aspx") 
br.set_handle_robots(False) 
print br.response 
time.sleep(9) 

latest_index = 0 
html_replaced = "" 
html = br.response().read() 


for m in re.finditer('(href|src)(=")(/[^"]+")', html): 
    html_replaced += html[latest_index:m.start()] + m.groups()[0]+m.groups()[1] + 'http://www.usautoforce.com' + m.groups()[2] 
    latest_index = m.end() 


f=open("us.html","w") 
f.write(html_replaced) 
f.close() 

print [form for form in br.forms()][0] 

br.set_handle_robots(False) 
print br.response 
time.sleep(9) 
html = br.response().read() 

br.select_form(nr=0) 
time.sleep(2) 

#for control in br.form.controls: 
# print control 
    # print "type=%s, name=%s value=%s" % (control.type, control.name, br[control.name]) 

br.form.set_all_readonly(False) 
br.form["nexpartuname"] = "abc" 

br.form["pwd"] = "xyz" 
br.submit() 

以下是錯誤:

File "haha.py", line 60, in <module> 
    br.form["nexpartuname"] = "clack" 
    File "/usr/lib/python2.7/site-packages/mechanize/_form.py", line 2775, in __setitem__ 
    control = self.find_control(name) 
    File "/usr/lib/python2.7/site-packages/mechanize/_form.py", line 3096, in find_control 
    return self._find_control(name, type, kind, id, label, predicate, nr) 
    File "/usr/lib/python2.7/site-packages/mechanize/_form.py", line 3180, in _find_control 
    raise ControlNotFoundError("no control matching "+description) 
mechanize._form.ControlNotFoundError: no control matching name 'nexpartuname' 

回答

0

機械化不執行JavaScript的。您嘗試訪問的網站也在說'請啓用腳本...'。

由於無法在機械化中啓用js,我個人建議您使用phantomjs。

但真正的問題在這裏不是JavaScript,它是網址。由於該網站上的網址是相對的,因此在下載並打開html代碼時,其行爲並不像預期的那樣。

您必須將所有相關網址轉換爲絕對網址。在將html寫入文件之前使用此代碼。將html_replaced str而不是html str寫入文件。

latest_index = 0 
html_replaced = "" 

for m in re.finditer('(href|src)(=")(/[^"]+")', html): 
    html_replaced += html[latest_index:m.start()] + m.groups()[0]+m.groups()[1] + 'http://www.usautoforce.com' + m.groups()[2] 
    latest_index = m.end() 
+0

但是當我試圖在瀏覽器中禁用javascripts後手動打開它工作。 – user3809411

+0

@ user3809411你是對的。真正的問題是相關網址。請檢查更新後的答案。 –

+0

謝謝你。它的工作現在。 – user3809411