I'm working on a video app and storing the files on AWS S3, using the default URL like https://***.amazonaws.com/***
works fine but I have decided to use CloudFront which is faster for content delivery.
Using CF, I keep getting 403 (Forbidden)
using this URL https://***.cloudfront.net/***
. Did I miss anything?
Everything works fine until I decide to load the contents from CloudFront which points to my bucket.
Any solution please?
When restricting access to S3 content using a bucket policy that inspects the incoming Referer:
header, you need to do a little bit of custom configuration to "outsmart" CloudFront.
It's important to understand that CloudFront is designed to be a well-behaved cache. By "well-behaved," I mean that CloudFront is designed to never return a response that differs from what the origin server would have returned. I'm sure you can see that is an important factor.
Let's say I have a web server (not S3) behind CloudFront, and my web site is designed so that it returns different content based on an inspection of the Referer:
header... or any other http request header, like User-Agent:
for example. Depending on your browser, I might return different content. How would CloudFront know this, so that it would avoid serving a user the wrong version of a certain page?
The answer is, it wouldn't be able to tell -- it can't know this. So, CloudFront's solution is not to forward most request headers to my server at all. What my web server can't see, it can't react to, so the content I return cannot vary based on headers I don't receive, which prevents CloudFront from caching and returning the wrong response, based on those headers. Web caches have an obligation to avoid returning the wrong cached content for a given page.
"But wait," you object. "My site depends on the value from a certain header in order to determine how to respond." Right, that makes sense... so we have to tell CloudFront this:
Instead of caching my pages based on just the requested path, I need you to also forward the Referer:
or User-Agent:
or one of several other headers as sent by the browser, and cache the response for use on other requests that include not only the same path, but also the same values for the extra header(s) that you forward to me.
However, when the origin server is S3, CloudFront doesn't support forwarding most request headers, on the assumption that since static content is unlikely to vary, these headers would just cause it to cache multiple identical responses unnecessarily.
Your solution is not to tell CloudFront that you're using S3 as the origin. Instead, configure your distribution to use a "custom" origin, and give it the hostname of the bucket to use as the origin server hostname.
Then, you can configure CloudFront to forward the Referer:
header to the origin, and your S3 bucket policy that denies/allows requests based on that header will work as expected.
Well, almost as expected. This will lower your cache hit ratio somewhat, since now the cached pages will be cached based on path + referring page. It an S3 object is referenced by more than one of your site's pages, CloudFront will cache a copy for each unique request. It sounds like a limitation, but really, it's only an artifact of proper cache behavior -- whatever gets forwarded to the back-end, almost all of it, must be used to determine whether that particular response is usable for servicing future requests.
See http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html#DownloadDistValuesForwardHeaders for configuring CloudFront to whitelist specific headers to send to your origin server.
Important: don't forward any headers you don't need, since every variant request reduces your hit rate further. Particularly when using S3 as the back-end for a custom origin, do not forward the Host:
header, because that is probably not going to do what you expect. Select the Referer:
header here, and test. S3 should begin to see the header and react accordingly.
Note that when you removed your bucket policy for testing, CloudFront would have continued to serve the cached error page unless you flushed your cache by sending an invalidation request, which causes CloudFront to purge all cached pages matching the path pattern you specify, over the course of about 15 minutes. The easiest thing to do when experimenting is to just create a new CloudFront distribution with the new configuration, since there is no charge for the distributions themselves.
When viewing the response headers from CloudFront, note the X-Cache:
(hit/miss) and Age:
(how long ago this particular page was cached) responses. These are also useful in troubleshooting.
Update: @alexjs has made an important observation: instead of doing this using the bucket policy and forwarding the Referer:
header to S3 for analysis -- which will hurt your cache ratio to an extent that varies with the spread of resources over referring pages -- you can use the new AWS Web Application Firewall service, which allows you to impose filtering rules against incoming requests to CloudFront, to allow or block requests based on string matching in request headers.
For this, you'd need to connect the distribution to S3 as as S3 origin (the normal configuration, contrary to what I proposed, in the solution above, with a "custom" origin) and use the built-in capability of CloudFront to authenticate back-end requests to S3 (so the bucket contents aren't directly accessible if requested from S3 directly by a malicious actor).
See https://www.alexjs.eu/preventing-hotlinking-using-cloudfront-waf-and-referer-checking/ for more on this option.