Amazon S3 enables users to create static websites easily. It goes without saying that as far as websites are static there's no server related "sugar" like dynamic HTTP headers that depends on user's request. Unfortunately it also means that S3 service can't respond with either compressed or uncompressed page response depending on request.
At the same time modern webpages grow constantly, one with jQuery and Bootstraps can easily reach 300+KB. Downloading several files can aggravate response time yet more. Search engines penalize sites with big latency, and no matter how fast your server is it most likely will not decrease latency as much as compression does. And it concerns not only search engines, it's quite reasonable - bad response time makes users unhappy. Do you remember how you feel when your OS loads several minutes? It's the same.
So lack of compression sucks. Really. And no, I don't know a way to make S3 work as web server. But it doesn't matter. Can you name a browser that doesn't support Gzip compression? Do you know someone who uses it? Personally I don't. Even lynx supports gzipped responses. So most likely nothing bad happens if we make our site Gzipped only. Yes, I know, I dreamed about better solution too, but it's the best I've found so far. If you can suggest something better, please contact me, I will be happy to hear about your approach. Meanwhile Gzipping webpages before submitting them to S3 bucket works for me now quite good and I'm going to describe how I manage to use it.
And finally another bad news (last ones I promise) - several repeating gzipping of the same file have different hash sums. At
the same time tools like
s3cmd execute synchronization of bucket with local directory based on hash sums and objects'
size. This means that you can't gzip the whole site every time you update some page. Actually you can, but it will
require manual synchronization with bucket that is huge pain in the ass, or it will cause useless synchronization of
unchanged pages you should pay for.
So what do I suggest? Automated synchronization of changed only files. For this purpose I suppose having two directories: one with latest uncompressed content, and another with published gzipped content. Before publication latest content is compared with extracted gzipped one, and if they are different latest one will be compressed and will replace gzipped one in the second directory. And if they are the same, nothing will happen, and gzipped page will not be changed as it's supposed. In such a way only changed pages will be synchronized by s3cmd.
As static site generator I use Pelican, so my website is
make. Finally deploying to Amazon S3 will affect these targets:
While publish target just compiles website and put resulting content to $(OUTPUT) directory, compress target gzips changed files putting them to $(S3_PUBLICATION_DIR) and s3_gzip_upload synchronizes local directory with S3 bucket setting some useful headers.
Now about synchronization with S3 bucket. There's a tool
s3cmd that can be installed easily. As mentioned previously
it can synchronize local directory with S3 bucket using MD5 hash function. Important part about compression is
Content-Encoding:gzip HTTP header, it tells browser that it should decompress page before rendering. With include
it's publicly available website, acl-public attributes specifies that objects can be requested anonymously.
Defining character set in HTTP header helps browser to render pages faster. It's even recommended to avoid its
specification in meta tag inside webpages. Another rendering acceleration approach is to add
It tells browser that once loaded this HTTP resource (S3 object) will be cached by browser and any access to it
max-age would cause loading page from local cache, not from server. It's amazing approach if you have huge