When sending a GET request directly to the backend with If-Modified-Since: Wed, 15 Feb 2012 07:25:00 CET
set, Apache correctly returns a 304 with no content.
When I send the same request through Varnish 3.0.2, it responds with a 200 and resends all the content even though the client already has it. Obviously, this isn't a good use of bandwidth. My understanding is that Varnish supports intelligent handling of this header and should be sending a 304, so I figure I'd done something wrong with my .vcl file.
Varnishlog gives this:
16 SessionOpen c 84.97.17.233 64416 :80
16 ReqStart c 84.97.17.233 64416 1597323690
16 RxRequest c GET
16 RxURL c /fr/CS/CS_AU-Maboreke-6-6-2004.pdf
16 RxProtocol c HTTP/1.0
16 RxHeader c Host: www.quotaproject.org
16 RxHeader c User-Agent: Sprawk/1.3 (http://www.sprawk.com/)
16 RxHeader c Accept: */*
16 RxHeader c Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
16 RxHeader c Connection: close
16 RxHeader c If-Modified-Since: Wed, 15 Feb 2012 07:25:00 CET
16 VCL_call c recv lookup
16 VCL_call c hash
16 Hash c /fr/CS/CS_AU-Maboreke-6-6-2004.pdf
16 Hash c www.quotaproject.org
16 VCL_return c hash
16 Hit c 1597322756
16 VCL_call c hit
16 VCL_acl c NO_MATCH CTRLF5
16 VCL_return c deliver
16 VCL_call c deliver deliver
16 TxProtocol c HTTP/1.1
16 TxStatus c 200
16 TxResponse c OK
16 TxHeader c Server: Apache
16 TxHeader c Last-Modified: Wed, 09 Jun 2004 16:07:50 GMT
16 TxHeader c Vary: Accept-Encoding
16 TxHeader c Content-Type: application/pdf
16 TxHeader c Date: Wed, 22 Feb 2012 18:25:05 GMT
16 TxHeader c Age: 12432
16 TxHeader c Connection: close
16 Gzip c U D - 107685 115763 80 796748 861415
16 Length c 98304
16 ReqEnd c 1597323690 1329935105.713264704 1329935106.208528996 0.000071526 0.000068426 0.495195866
16 SessionClose c EOF mode
16 StatSess c 84.97.17.233 64416 0 1 1 0 0 0 203 98304
If I understand this correctly, the object is already in Varnish's cache so it doesn't need to contact the backend, but it already knows the Last-Modified
so why would it not respond with 304?
And here's my VCL file:
backend idea {
# .host = "www.idea.int";
.host = "83.145.60.235"; # IDEA's public website IP
.port = "80";
}
backend qp {
# .host = "www.quotaproject.org";
.host = "83.145.60.235"; # IDEA's public website IP
.port = "80";
}
#
#Below is a commented-out copy of the default VCL logic. If you
#redefine any of these subroutines, the built-in logic will be
#appended to your code.
#
sub vcl_recv {
# force domain so that Apache handles the VH correctly
if (req.http.host ~ "^qp" || req.http.host ~ "quotaproject.org$") {
set req.http.Host = "www.quotaproject.org";
set req.backend = qp;
} else {
# default to idea.int
set req.http.Host = "www.idea.int";
set req.backend = idea;
}
# Before anything else we need to fix gzip compression
if (req.http.Accept-Encoding) {
if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
# No point in compressing these
remove req.http.Accept-Encoding;
} else if (req.http.Accept-Encoding ~ "gzip") {
set req.http.Accept-Encoding = "gzip";
} else if (req.http.Accept-Encoding ~ "deflate") {
set req.http.Accept-Encoding = "deflate";
} else {
# unknown algorithm
remove req.http.Accept-Encoding;
}
}
# ajax requests bypass cache. TODO: Make sure you Javascript implementation for AJAX actually sets XMLHttpRequest
if (req.http.X-Requested-With == "XMLHttpRequest") {
return(pass);
}
if (req.request != "GET" &&
req.request != "HEAD" &&
req.request != "PUT" &&
req.request != "POST" &&
req.request != "TRACE" &&
req.request != "OPTIONS" &&
req.request != "DELETE") {
/* Non-RFC2616 or CONNECT which is weird. */
return (pipe);
}
# Purge everything url - this isn't the squid way, but works
if (req.url ~ "^/varnishpurge") {
if (!client.ip ~ purge) {
error 405 "Not allowed.";
}
if (req.url == "/varnishpurge") {
ban("req.http.host == " + req.http.host + " && req.url ~ ^/");
error 841 "Purged site.";
}
else {
ban("req.http.host == " + req.http.host + " && req.url ~ ^" + regsub( req.url, "^/varnishpurge(.*)$", "\1" ) + "$");
error 842 "Purged page.";
}
}
# spoof the client IP (taken from http://utvbloggen.se/snabb-guide-till-varnish/)
remove req.http.X-Forwarded-For;
set req.http.X-Forwarded-For = client.ip;
# Force delivery from cache even if other things indicate otherwise
if (req.url ~ "\.(flv)") {
# pipe flash start away
return(pipe);
}
if (req.url ~ "\.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") {
# cookies are irrelevant here
unset req.http.Cookie;
unset req.http.Authorization;
}
# Force short-circuit to the real site for these dynamic pages
if (req.url ~ "/customcf/" || req.url ~ "/uid/editData.cfm" || req.url ~ "^/private/") {
return(pass);
}
# Remove user agent, since Apache will server these resources the same way
if (req.http.User-Agent) {
set req.http.User-Agent = "";
}
if (req.http.Cookie) {
# removes all cookies named __utm? (utma, utmb...) - tracking thing
set req.http.Cookie = regsuball(req.http.Cookie, "(^|; ) *__utm.=[^;]+;? *", "\1");
# remove cStates for RHM boxes (the server doesn't need to know these, JS will handle this client-side)
set req.http.cookie = regsub(req.http.cookie, "(; )?cStates=[^;]*", ""); #cStates might sometimes have a blank value
# remove ColdFusion session cookie stuff
if (!req.url ~ "^/publications/" && !req.url ~ "^/uid/admin/") {
set req.http.cookie = regsub(req.http.cookie, "(; )?CFID=[^;]+", "");
set req.http.cookie = regsub(req.http.cookie, "(; )?CFTOKEN=[^;]+", "");
}
# Remove the cookie header if it's empty after cleanup
if (req.http.cookie ~ "^;? *$") {
# The only cookie data left is a semicolon or spaces
remove req.http.cookie;
}
}
}
#
# Called when the requested object was not found in the cache
#
sub vcl_hit {
# Allow administrators to easily flush the cache from their browser
if (client.ip ~ CTRLF5) {
if (req.http.pragma ~ "no-cache" || req.http.Cache-Control ~ "no-cache") {
set obj.ttl = 0s;
return(pass);
}
}
}
#
# Called when the requested object has been retrieved from the
# backend, or the request to the backend has failed
#
sub vcl_fetch {
set beresp.grace = 1h;
# strip the cookie before the image is inserted into cache.
if (req.url ~ "\.(jpg|jpeg|gif|png|tiff|tif|svg|swf|ico|css|vsd|doc|ppt|pps|xls|pdf|mp3|mp4|m4a|ogg|mov|avi|wmv|sxw|zip|gz|bz2|tgz|tar|rar|odc|odb|odf|odg|odi|odp|ods|odt|sxc|sxd|sxi|sxw|dmg|torrent|deb|msi|iso|rpm)$") {
remove beresp.http.set-cookie;
set beresp.ttl = 100w;
}
# Remove CF session cookies for everything but the publications subsite
if (!req.url ~ "^/publications/" && !req.url ~ "/customcf/" && !req.url ~ "^/uid/admin/" && !req.url ~ "^/uid/editData.cfm") {
remove beresp.http.set-cookie;
}
if (beresp.ttl < 48h) {
set beresp.ttl = 48h;
}
}
#
# Called before a cached object is delivered to the client
#
sub vcl_deliver {
# We'll be hiding some headers added by Varnish. We want to make sure people are not seeing we're using Varnish.
remove resp.http.X-Varnish;
remove resp.http.Via;
# We'd like to hide the X-Powered-By headers. Nobody has to know we can run PHP and have version xyz of it.
remove resp.http.X-Powered-By;
}
Can anyone see the problem or problems?
Update: According to http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3
Note: When handling an If-Modified-Since header field, some
servers will use an exact date comparison function, rather than a
less-than function, for deciding whether to send a 304 (Not
Modified) response.
It seems this may be Varnish's behaviour. I'm sending another date which is previous to the real file's last modified date, but not exactly what is cached in Varnish.
The problem is the non-GMT time zone in the If-Modified-Since request header:
If-Modified-Since: Wed, 15 Feb 2012 07:25:00 CET
According to http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.3
All HTTP date/time stamps MUST be represented in Greenwich Mean Time (GMT), without exception.
Varnish implements this as a strict requirement, whereas Apache handles nonstandard date formats more robustly. This is why you observed different behavior when querying Apache directly.