2012-02-22 31 views
11

當直接發送GET請求到設置了If-Modified-Since: Wed, 15 Feb 2012 07:25:00 CET的後端時,Apache會正確返回一個沒有內容的304。當If-Modified-Since標題被髮送時,爲什麼varnish沒有修改發送304?

當我通過Varnish 3.0.2發送相同的請求時,它會響應200,並重新發送所有內容,即使客戶端已經擁有了它。顯然,這不是很好的使用帶寬。我的理解是,Varnish支持智能處理這個頭文件,並且應該發送一個304,所以我認爲我的.vcl文件做了一些錯誤。

Varnishlog給出了這樣的:

16 SessionOpen c 84.97.17.233 64416 :80 
    16 ReqStart  c 84.97.17.233 64416 1597323690 
    16 RxRequest c GET 
    16 RxURL  c /fr/CS/CS_AU-Maboreke-6-6-2004.pdf 
    16 RxProtocol c HTTP/1.0 
    16 RxHeader  c Host: www.quotaproject.org 
    16 RxHeader  c User-Agent: Sprawk/1.3 (http://www.sprawk.com/) 
    16 RxHeader  c Accept: */* 
    16 RxHeader  c Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 
    16 RxHeader  c Connection: close 
    16 RxHeader  c If-Modified-Since: Wed, 15 Feb 2012 07:25:00 CET 
    16 VCL_call  c recv lookup 
    16 VCL_call  c hash 
    16 Hash   c /fr/CS/CS_AU-Maboreke-6-6-2004.pdf 
    16 Hash   c www.quotaproject.org 
    16 VCL_return c hash 
    16 Hit   c 1597322756 
    16 VCL_call  c hit 
    16 VCL_acl  c NO_MATCH CTRLF5 
    16 VCL_return c deliver 
    16 VCL_call  c deliver deliver 
    16 TxProtocol c HTTP/1.1 
    16 TxStatus  c 200 
    16 TxResponse c OK 
    16 TxHeader  c Server: Apache 
    16 TxHeader  c Last-Modified: Wed, 09 Jun 2004 16:07:50 GMT 
    16 TxHeader  c Vary: Accept-Encoding 
    16 TxHeader  c Content-Type: application/pdf 
    16 TxHeader  c Date: Wed, 22 Feb 2012 18:25:05 GMT 
    16 TxHeader  c Age: 12432 
    16 TxHeader  c Connection: close 
    16 Gzip   c U D - 107685 115763 80 796748 861415 
    16 Length  c 98304 
    16 ReqEnd  c 1597323690 1329935105.713264704 1329935106.208528996 0.000071526 0.000068426 0.495195866 
    16 SessionClose c EOF mode 
    16 StatSess  c 84.97.17.233 64416 0 1 1 0 0 0 203 98304 

如果我理解這個正確的,對象是已在上光油的緩存,這樣它並不需要聯繫的後端,但它已經知道了Last-Modified所以爲什麼不迴應304?

這是我的VCL文件:

backend idea { 
    # .host = "www.idea.int"; 
    .host = "83.145.60.235"; # IDEA's public website IP 
    .port = "80"; 
} 
backend qp { 
    # .host = "www.quotaproject.org"; 
    .host = "83.145.60.235"; # IDEA's public website IP 
    .port = "80"; 
} 
# 
#Below is a commented-out copy of the default VCL logic. If you 
#redefine any of these subroutines, the built-in logic will be 
#appended to your code. 
# 
sub vcl_recv { 
    # force domain so that Apache handles the VH correctly 
    if (req.http.host ~ "^qp" || req.http.host ~ "quotaproject.org$") { 
    set req.http.Host = "www.quotaproject.org"; 
    set req.backend = qp; 
    } else { 
    # default to idea.int 
    set req.http.Host = "www.idea.int"; 
    set req.backend = idea; 
    } 
    # Before anything else we need to fix gzip compression 
    if (req.http.Accept-Encoding) { 
     if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") { 
      # No point in compressing these 
      remove req.http.Accept-Encoding; 
     } else if (req.http.Accept-Encoding ~ "gzip") { 
      set req.http.Accept-Encoding = "gzip"; 
     } else if (req.http.Accept-Encoding ~ "deflate") { 
      set req.http.Accept-Encoding = "deflate"; 
     } else { 
      # unknown algorithm 
      remove req.http.Accept-Encoding; 
     } 
    } 
    # ajax requests bypass cache. TODO: Make sure you Javascript implementation for AJAX actually sets XMLHttpRequest 
    if (req.http.X-Requested-With == "XMLHttpRequest") { 
     return(pass); 
    } 
    if (req.request != "GET" && 
    req.request != "HEAD" && 
    req.request != "PUT" && 
    req.request != "POST" && 
    req.request != "TRACE" && 
    req.request != "OPTIONS" && 
    req.request != "DELETE") { 
    /* Non-RFC2616 or CONNECT which is weird. */ 
    return (pipe); 
    } 
    # Purge everything url - this isn't the squid way, but works 
    if (req.url ~ "^/varnishpurge") { 
     if (!client.ip ~ purge) { 
      error 405 "Not allowed."; 
     } 
     if (req.url == "/varnishpurge") { 
      ban("req.http.host == " + req.http.host + " && req.url ~ ^/"); 
      error 841 "Purged site."; 
     } 
     else { 
      ban("req.http.host == " + req.http.host + " && req.url ~ ^" + regsub(req.url, "^/varnishpurge(.*)$", "\1") + "$"); 
      error 842 "Purged page."; 
     } 
    } 
    # spoof the client IP (taken from http://utvbloggen.se/snabb-guide-till-varnish/) 
    remove req.http.X-Forwarded-For; 
    set req.http.X-Forwarded-For = client.ip; 
    # Force delivery from cache even if other things indicate otherwise 
    if (req.url ~ "\.(flv)") { 
    # pipe flash start away 
    return(pipe); 
    } 
    if (req.url ~ "\.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") { 
    # cookies are irrelevant here 
    unset req.http.Cookie; 
    unset req.http.Authorization; 
    } 
    # Force short-circuit to the real site for these dynamic pages 
    if (req.url ~ "/customcf/" || req.url ~ "/uid/editData.cfm" || req.url ~ "^/private/") { 
    return(pass); 
    } 
    # Remove user agent, since Apache will server these resources the same way 
    if (req.http.User-Agent) { 
    set req.http.User-Agent = ""; 
    } 
    if (req.http.Cookie) { 
    # removes all cookies named __utm? (utma, utmb...) - tracking thing 
    set req.http.Cookie = regsuball(req.http.Cookie, "(^|;) *__utm.=[^;]+;? *", "\1"); 
    # remove cStates for RHM boxes (the server doesn't need to know these, JS will handle this client-side) 
    set req.http.cookie = regsub(req.http.cookie, "(;)?cStates=[^;]*", ""); #cStates might sometimes have a blank value 
    # remove ColdFusion session cookie stuff 
    if (!req.url ~ "^/publications/" && !req.url ~ "^/uid/admin/") { 
     set req.http.cookie = regsub(req.http.cookie, "(;)?CFID=[^;]+", ""); 
     set req.http.cookie = regsub(req.http.cookie, "(;)?CFTOKEN=[^;]+", ""); 
    } 
    # Remove the cookie header if it's empty after cleanup 
    if (req.http.cookie ~ "^;? *$") { 
     # The only cookie data left is a semicolon or spaces 
     remove req.http.cookie; 
    } 
    } 
} 
# 
# Called when the requested object was not found in the cache 
# 
sub vcl_hit { 
    # Allow administrators to easily flush the cache from their browser 
    if (client.ip ~ CTRLF5) { 
    if (req.http.pragma ~ "no-cache" || req.http.Cache-Control ~ "no-cache") { 
     set obj.ttl = 0s; 
     return(pass); 
    } 
    } 
} 
# 
# Called when the requested object has been retrieved from the 
# backend, or the request to the backend has failed 
# 
sub vcl_fetch { 
    set beresp.grace = 1h; 
    # strip the cookie before the image is inserted into cache. 
    if (req.url ~ "\.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") { 
    remove beresp.http.set-cookie; 
    set beresp.ttl = 100w; 
    } 
    # Remove CF session cookies for everything but the publications subsite 
    if (!req.url ~ "^/publications/" && !req.url ~ "/customcf/" && !req.url ~ "^/uid/admin/" && !req.url ~ "^/uid/editData.cfm") { 
    remove beresp.http.set-cookie; 
    } 
    if (beresp.ttl < 48h) { 
    set beresp.ttl = 48h; 
    } 
} 
# 
# Called before a cached object is delivered to the client 
# 
sub vcl_deliver { 
    # We'll be hiding some headers added by Varnish. We want to make sure people are not seeing we're using Varnish. 
    remove resp.http.X-Varnish; 
    remove resp.http.Via; 
    # We'd like to hide the X-Powered-By headers. Nobody has to know we can run PHP and have version xyz of it. 
    remove resp.http.X-Powered-By; 
} 

任何人都可以看到的問題或問題?

更新:根據http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3

Note: When handling an If-Modified-Since header field, some 
     servers will use an exact date comparison function, rather than a 
     less-than function, for deciding whether to send a 304 (Not 
     Modified) response. 

看來這可能是上光油的行爲。我發送的是真實文件上次修改日期之前的另一個日期,但不完全是在Varnish中緩存的內容。

回答

2

由於這個問題仍然沒有答案和多個投票打開,我會發佈一個答案。

這似乎不是與Varnish 3.0.0(我們正在使用)或您在網站上運行的當前版本的Varnish相關的問題。

200 OK響應請求與內容時過期的If-Modified-Since標題:

# curl -z "Wed, 09 Jun 2010 16:07:50 GMT" --head "www.quotaproject.org/robots.txt" 
HTTP/1.1 200 OK 
Server: Apache 
Last-Modified: Tue, 22 Jan 2013 13:23:41 GMT 
Vary: Accept-Encoding 
Cache-Control: public 
Content-Type: text/plain; charset=UTF-8 
Date: Mon, 25 Nov 2013 15:00:45 GMT 
Age: 69236 
Connection: keep-alive 
X-Cache: HIT 

304響應時的If-Modified-由於上次修改日期之後:

# curl -z "Wed, 09 Jun 2013 16:07:50 GMT" --head "www.quotaproject.org/robots.txt" 
HTTP/1.1 304 Not Modified 
Server: Apache 
Last-Modified: Tue, 22 Jan 2013 13:23:41 GMT 
Vary: Accept-Encoding 
Cache-Control: public 
Content-Type: text/plain; charset=UTF-8 
Date: Mon, 25 Nov 2013 15:00:52 GMT 
Age: 69243 
Connection: keep-alive 
X-Cache: HIT 

的與您在varnish日誌輸出中給出的示例相同:

# curl -z "Wed, 15 Feb 2012 07:25:00 CET" --head "www.quotaproject.org/fr/CS/CS_AU-Maboreke-6-6-2004.pdf" 
HTTP/1.1 304 Not Modified 
Server: Apache 
Last-Modified: Wed, 09 Jun 2004 16:07:50 GMT 
Cache-Control: public 
Content-Type: application/pdf 
Accept-Ranges: bytes 
Date: Mon, 25 Nov 2013 15:08:48 GMT 
Age: 335802 
Connection: keep-alive 
X-Cache: HIT 

我會說Varnish按預期工作編輯。也許這是你使用的Varnish版本的一個問題,或者測試方法有些不妥之處。我也看不出您的VCL有任何問題。

7

的問題是在非GMT時區中的的If-Modified-Since請求頭:

If-Modified-Since: Wed, 15 Feb 2012 07:25:00 CET 

根據http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.3

所有HTTP日期/時間戳必須在格林威治來表示平均時間(GMT),無一例外。

Varnish將此視爲一項嚴格的要求,而Apache則更加強大地處理非標準日期格式。這就是爲什麼你直接查詢Apache時觀察到不同的行爲。