2014-12-01 125 views
2

我用這個代碼掃描包含與在線掃描儀「https://wepawet.iseclab.org/」使用此憑證文件夾中的多個PDF文件掃描的PDF文件。python腳本使用在線掃描

import mechanize 
import re 
import os 

def upload_file(uploaded_file): 
    url = "https://wepawet.iseclab.org/" 
    br = mechanize.Browser() 
    br.set_handle_robots(False) # ignore robots 
    br.open(url) 
    br.select_form(nr=0) 
    f = os.path.join("200",uploaded_file) 
    br.form.add_file(open(f) ,'text/plain', f) 
    br.form.set_all_readonly(False) 
    res = br.submit() 
    content = res.read() 
    with open("200_clean.html", "a") as f: 
     f.write(content) 

def main(): 

    for file in os.listdir("200"): 
     upload_file(file) 

if __name__ == '__main__': 
    main() 

,但我得到了以下錯誤的代碼執行後:

Traceback (most recent call last): 
    File "test.py", line 56, in <module> 
    main() 
    File "test.py", line 50, in main 
    upload_file(file) 
    File "test.py", line 40, in upload_file 
    res = br.submit() 
    File "/home/suleiman/Desktop/mechanize/_mechanize.py", line 541, in submit 
    return self.open(self.click(*args, **kwds)) 
    File "/home/suleiman/Desktop/mechanize/_mechanize.py", line 203, in open 
    return self._mech_open(url, data, timeout=timeout) 
    File "/home/suleiman/Desktop/mechanize/_mechanize.py", line 255, in _mech_open 
    raise response 
mechanize._response.httperror_seek_wrapper: HTTP Error refresh: The HTTP server returned a redirect error that would lead to an infinite loop. 
The last 30x error message was: 
OK 

可以在任何一個可以幫助我解決這個問題?

+0

在我看來,如果是該網站的設計,導致了它。 – HarryCBurn 2014-12-01 22:36:10

+0

你覺得我怎麼能解決呢? – 2014-12-01 22:38:29

+0

我不確定,對不起。我假設它沒有處理你的代碼,但我可能是錯的。 – HarryCBurn 2014-12-01 22:43:24

回答

0

我認爲這個問題是MIME類型text/plain設置。對於PDF,這應該是application/pdf。當我上傳樣本PDF時,您的代碼隨着此更改而生效。

更改br.form.add_file調用這個樣子:

br.form.add_file(open(f), 'application/pdf', f)