獲取當前版本的網頁

如何在Wget或Python中獲取當前版本的網頁？我需要完全關閉緩存。獲取當前版本的網頁

我正試圖編寫代碼，每秒下載http://robocademy.com/courses/arduino/get_code/。使用Python的urllib和Wget我沒有像在Chrome中那樣獲取當前文件。我試過

wget --cache=off --user-agent="Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)" http://robocademy.com/courses/arduino/get_code/

和urllib的與urllib.urlcleanup

響應頭在Chrome：在Wget的

Accept-Ranges:bytes 
Age:0 
Connection:keep-alive 
Content-Encoding:gzip 
Content-Length:449 
Content-Type:text/plain 
Date:Wed, 28 Nov 2012 23:20:24 GMT 
Server:nginx 
Vary:Accept-Encoding 
Via:1.1 varnish 
X-Varnish:400211059

響應頭

HTTP/1.1 200 OK 
Server: nginx 
Content-Type: text/plain 
Keep-Alive: timeout=20 
Vary: Accept-Encoding 
Transfer-Encoding: chunked 
Date: Wed, 28 Nov 2012 23:22:20 GMT 
X-Varnish: 400216320 400212892 
Age: 76 
Via: 1.1 varnish 
Connection: keep-alive

來源

2012-11-28 Timothy Clemans

這是什麼問題？ – Lior

如何在Wget或Python中獲取當前版本的網頁？ –

您*正在*獲取當前版本的網頁，因爲服務器被配置爲提供服務。您是否有特定的原因，您爲什麼要根據管理員的意圖覆蓋服務器的配置？ –

-1

你可以嘗試添加--no-cache去wget。根據手冊：

禁用服務器端緩存。在這種情況下，Wget會向遠程服務器發送適當的指令（'Pragma：no-cache'）以從遠程服務獲取文件，而不是返回緩存版本。這對檢索和清除代理服務器上的過期文檔特別有用。

默認情況下允許緩存。

cache=off應該在wgetrc文件中。

對於Python，您可以考慮這個answer。

來源

2012-11-28 23:48:11 Bula

他的問題不是用wget，而是用python .... – tink

你確定嗎？問題是：如何在Wget或Python中獲取當前版本的網頁？ – Bula

我的問題的第4行說我試過--no-cache –

獲取當前版本的網頁

回答

相關問題