What are the options for the gzip_proxied directive for?

George Reith picture George Reith · Oct 27, 2015 · Viewed 8.6k times · Source

The gzip_proxied directive allows for the following options (non-exhaustive):

  • expired
    enables compression if a response header includes the “Expires” field with a value that disables caching;
  • no-cache
    enables compression if a response header includes the “Cache-Control” field with the “no-cache” parameter;
  • no-store
    enables compression if a response header includes the “Cache-Control” field with the “no-store” parameter;
  • private
    enables compression if a response header includes the “Cache-Control” field with the “private” parameter;
  • no_last_modified
    enables compression if a response header does not include the “Last-Modified” field;
  • no_etag
    enables compression if a response header does not include the “ETag” field;
  • auth
    enables compression if a request header includes the “Authorization” field;

I can't see any rational reason to use most of these options. For example, why would whether or not a proxied request contains the Authorization header, or Cache-Control: private, affect whether or not I want to gzip it?

Given that old versions of Nginx strip ETags from responses when gzipping them, I can see a use case for no_etag: if you don't have Nginx configured to generate ETags for your gzipped responses, you may prefer to pass on an uncompressed response with an ETag rather than generate a compressed one without an ETag.

I can't figure out the others, though.

What are the intended use cases of each of these options?

Answer

Alex Nauda picture Alex Nauda · Oct 31, 2015

From the admin guide: (emphasis mine)

The directive has a number of parameters specifying which kinds of proxied requests NGINX should compress. For example, it is reasonable to compress responses only to requests that will not be cached on the proxy server. For this purpose the gzip_proxied directive has parameters that instruct NGINX to check the Cache-Control header field in a response and compress the response if the value is no-cache, no-store, or private. In addition, you must include the expired parameter to check the value of the Expires header field. These parameters are set in the following example, along with the auth parameter, which checks for the presence of the Authorization header field (an authorized response is specific to the end user and is not typically cached)

I'd agree that not compressing cacheable responses is reasonable. Consider that the primary savings of caching at a proxy is to increase performance (response time) and reduce the time and bandwidth that the proxy spends in requesting the upstream resource. The tradeoff to gain these performance benefits is the cost of cache storage. Here are some use cases where not compressing cacheable responses make sense:

  1. In the normal web traffic of many sites, non-personalized responses (which constitute the majority of cacheable responses) have already been optimized through techniques like script minification, image size optimization, etc., in a web build process. While these static resources might shrink slightly from compression, the CPU cost of trying to gzip them smaller is probably not an efficient use of the proxy layer machine resources. But dynamically generated pages, served to logged-in users, containing tons of application-generated content would very likely benefit from compression (and would typically not be cacheable).

  2. You are setting up a proxy in front of a costly upstream service, but you are serving responses to another proxy that will be responsible for compression for each user agent. For example, if you have a CDN that makes multiple requests to the same costly upstream resource (from separate geographical edge locations) and you want to ensure that you can reuse the costly response. If the CDN caches uncompressed versions (to service both compressed and uncompressed user agents) you may be compressing at your proxy only to have them uncompress again at the CDN, which is simply a waste of hardware and electricity on both sides, to reduce bandwidth in the highest-bandwidth part of the chain. (Response gzip compression is most beneficial at the last mile, to get the response data to your user's phone which has dropped to one dot of signal as they enter the subway.)

  3. For sizable response entities, requests may come in (from various user agents, but often via downstream CDN intermediaries) for byte ranges of the resource, to user agents that don't support compression. The CDN is likely to serve byte range requests from its own cache, provided that it has an uncompressed version already in its cache.