如何使用python下載隱藏/提供給html頁面的文件？

我試圖從互聯網上使用python下載文件。我已經tryed這些代碼：如何使用python下載隱藏/提供給html頁面的文件？

import urllib.requests 
URL = 'http://www.mediafire.com/download/raju14e8aq6azbo/Getting+Started+with+MediaFire.pdf' 
filename = "file.pdf" 
urllib.request.urlretrieve(URL,filename)

和：

from urllib.request import urlopen 
from shutil import copyfileobj 

URL = 'http://www.mediafire.com/download/raju14e8aq6azbo/Getting+Started+with+MediaFire.pdf' 
filename = "file.pdf" 
with urlopen(URL) as in_stream, open(filename, 'wb') as out_file: 
    copyfileobj(in_stream, out_file)

（我發現這個最後的代碼爲：What command to use instead of urllib.request.urlretrieve?）

的問題是，這些代碼下載一個HTML文件，而不是我需要的名爲「MediaFire.pdf入門」的.pdf文件！我正在尋找一種方式來下載隱藏/提供HTML頁面後面的文件。

有什麼建議嗎？

來源

2015-07-13 Vinciuz

[this]（http://www.blog.pythonlibrary.org/2012/06/07/python-101-how-to-download-a-file/）可能有幫助 –

有點奇怪，你的網址以.pdf結尾，並且您使用文件名也與file.pdf –

請定義損壞！它是空的？它是否包含一些其他信息？用十六進制或文本編輯器查看內容可能是一個好主意，看看它裏面可能有一個HTML錯誤頁面。 –

這是因爲您嘗試下載的鏈接不是PDF文件。這是一個html文檔。你可以用chrome/firefox /其他瀏覽器打開。

您需要找到正確的鏈接才能下載。嘗試在瀏覽器中使用「另存爲」 - 如果有效，那麼Python代碼將起作用

僅僅因爲URL以「.pdf」結尾並不意味着它確實是pdf。對於你的例子，正確的鏈接是 - http://download834.mediafire.com/dsq8ih5dubng/raju14e8aq6azbo/Getting+Started+with+MediaFire.pdf，如果你使用ctrl + s或wget或curl，它實際上會下載文件。

來源

2015-07-13 13:04:20 AbdealiJK

我嘗試了正確的URL：http://www.mediafire.com/download/raju14e8aq6azbo/Getting+Started+with+MediaFire.pdf，但問題是一樣的！ – Vinciuz

你能解釋一下：「試着在瀏覽器上使用'另存爲' – Vinciuz

@Vinciuz仍然不是正確的URL，你能檢查我上面提到的那個嗎？另外，按照我的意思保存 - 在鍵盤上按ctrl + s或右鍵單擊屏幕上的任意位置，然後選擇「另存爲」 – AbdealiJK

對不起，有時候我是世界上最懶的人！ JDK是對的，我一直使用錯誤的URL，即使當JDK說我改變URL時，我改變了它使用另一個錯誤的URL！

標誌着我像一個正確的JDK的答案，向下跌破我張貼，我終於使用的代碼：

import urllib2,fpformat 

url = "http://download1063.mediafire.com/qjhujh1ajzwg/raju14e8aq6azbo/Getting+Started+with+MediaFire.pdf" 

file_name = url.split('/')[-1] 
u = urllib2.urlopen(url) 
f = open(file_name, 'wb') 
meta = u.info() 
file_size = int(meta.getheaders("Content-Length")[0]) 
print "Downloading: %s Bytes: %s" % (file_name, file_size) 
print "" 

file_size_dl = 0 
block_sz = int(fpformat.fix(file_size/110,0)) 
print block_sz 

while True: 
    buffer = u.read(block_sz) 
    if not buffer: 
     break 
    file_size_dl += len(buffer) 
    f.write(buffer) 
    status = (file_size_dl * 100)/file_size 
    print status , ' % - ',file_size_dl,' byte su ',file_size,' byte' 
f.close() 
print " complete ! "

它不是'T是最有用的代碼，我正在上一個更加快速和正確的代碼，我會盡快將它發佈到下面！

來源

2015-07-14 08:54:16 Vinciuz

如何使用python下載隱藏/提供給html頁面的文件？

回答

相關問題