Friday, September 6, 2013

Speed up a web site by enabling Apache file gzip compression

Setting up the server

The "good news" is that we can't control the browser. It either sends the Accept-encoding: gzip, deflate header or it doesn't.
Our job is to configure the server so it returns zipped content if the browser can handle it, saving bandwidth for everyone (and giving us a happy user).
For IIS, enable compression in the settings.
In Apache, enabling output compression is fairly straightforward. Add the following to your .htaccess file:

# compress text, html, javascript, css, xml:
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript

# Or, compress certain file types by extension:
<files *.html>
SetOutputFilter DEFLATE
</files>

Apache actually has two compression options:
  • mod_deflate is easier to set up and is standard.
  • mod_gzip seems more powerful: you can pre-compress content.
Deflate is quick and works, so I use it; use mod_gzip if that floats your boat. In either case, Apache checks if the browser sent the "Accept-encoding" header and returns the compressed or regular version of the file. However, some older browsers may have trouble (more below) and there are special directives you can add to correct this.
If you can't change your .htaccess file, you can use PHP to return compressed content. Give your HTML file a .php extension and add this code to the top:
In PHP:
<?php if (substr_count($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip')) ob_start("ob_gzhandler"); else ob_start(); ?>
We check the "Accept-encoding" header and return a gzipped version of the file (otherwise the regular version). This is almost like building your own webserver (what fun!). But really, try to use Apache to compress your output if you can help it. You don't want to monkey with your files.

How to enable file compression

Apache 1.x and 2.x can automatically compress files, but neither one comes with a compressor enabled by default. Enabling compression reduces CSS, HTML, and JavaScript file sizes by 55-65% and speeds up overall page load times by 35-40%.
Apache uses plug-in modules to add functionality. For Apache 1.x, use the free mod_gzip module to compress files. For Apache 2.x, use mod_gzip or the built-in mod_deflatemodule.
Enable file compression using mod_gzip
The mod_gzip module can be used with Apache 1.x or 2.x, but it doesn’t come with either Apache distribution. You’ll need to download and install it separately.
  1. Windows:
    1. Log in to your PC using an account with administrator privileges.
    2. Download the zip file containing ApacheModuleGzip.dll from SourceForge.
    3. Unzip the file.
    4. Move ApacheModuleGzip.dll to your Apache modules folder (typically “c:\Program Files\Apache Group\Apache\modules”).
    5. Edit your server configuration file using a text editor like NotePad (typically “c:\Program Files\Apache Group\Apache\conf\httpd.conf”). Add the following line to your server configuration file as the last loaded module:
      LoadModule gzip_module modules/ApacheModuleGzip.dll
      Add the following lines to your server configuration file or to a site’s “.htaccess” file:
      <IfModule mod_gzip.c>
          mod_gzip_on       Yes
          mod_gzip_dechunk  Yes
          mod_gzip_item_include file      \.(html?|txt|css|js|php|pl)$
          mod_gzip_item_include handler   ^cgi-script$
          mod_gzip_item_include mime      ^text/.*
          mod_gzip_item_include mime      ^application/x-javascript.*
          mod_gzip_item_exclude mime      ^image/.*
          mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
      </IfModule>
  2. Mac:
    1. Log in to your Mac using an account with administrator privileges.
    2. Download the zip file containing the module’s C source code from SourceForge.
    3. Unzip the file.
    4. Compile the module using the included instructions.
    5. Move mod_gzip.so to your Apache modules folder (typically “/usr/libexec/httpd”).
    6. Edit your server configuration file using a text editor like TextEdit or vim (typically “/etc/httpd/httpd.conf”). Add the following line to your server configuration file as the last loaded module:
      LoadModule gzip_module libexec/mod_gzip.so
      Add the following lines to your server configuration file or to a site’s “.htaccess” file:
      <IfModule mod_gzip.c>
          mod_gzip_on       Yes
          mod_gzip_dechunk  Yes
          mod_gzip_item_include file      \.(html?|txt|css|js|php|pl)$
          mod_gzip_item_include handler   ^cgi-script$
          mod_gzip_item_include mime      ^text/.*
          mod_gzip_item_include mime      ^application/x-javascript.*
          mod_gzip_item_exclude mime      ^image/.*
          mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
      </IfModule>
                          
  3. Restart Apache.
The “LoadModule” line in the configuration file makes the module ready, while the other lines configure and enable it. Put these other lines in the server’s configuration file to affect all sites served by the web server. Or put them within a site’s “VirtualHost” block or in its own “.htaccess” file to affect only that site.
The remaining lines tell the module to compress files with .htm, .html, .txt, .css, .js, .php, and .pl file name extensions, the output of CGI scripts, and any output that is text or JavaScript, but not images. The last line tells the module to skip compressing content that is already compressed.

Enable file compression using mod_deflate

The mod_deflate module comes with Apache 2.x. All you need to do is enable it.
  1. Windows:
    1. Log in to your PC using an account with administrator privileges.
    2. Edit your server configuration file using a text editor like NotePad (typically “c:\Program Files\Apache Group\Apache\conf\httpd.conf”). Add the following lines to your server configuration file or to a site’s “.htaccess” file:
      <Location />
          SetOutputFilter DEFLATE
            SetEnvIfNoCase Request_URI  \
              \.(?:gif|jpe?g|png)$ no-gzip dont-vary
          SetEnvIfNoCase Request_URI  \
              \.(?:exe|t?gz|zip|gz2|sit|rar)$ no-gzip dont-vary
      </Location> 
  2. Mac:
    1. Log in to your Mac using an account with administrator privileges.
    2. Edit your server configuration file using a text editor like TextEdit or vim (typically “/etc/httpd/httpd.conf”). Add the following lines to your server configuration file or to a site’s “.htaccess” file:
      <Location />
          SetOutputFilter DEFLATE
          SetEnvIfNoCase Request_URI  \
              \.(?:gif|jpe?g|png)$ no-gzip dont-vary
          SetEnvIfNoCase Request_URI  \
              \.(?:exe|t?gz|zip|gz2|sit|rar)$ no-gzip dont-vary
      </Location> 
  3. Restart Apache.
Put the configuration lines in the server’s configuration file to affect all sites served by the web server. Or put them within a site’s “VirtualHost” block or in its “.htaccess” file to affect only that site.
The “SetOutputFilter” line enables the module.
The next two lines instruct the module to skip compressing image files (.gif, .jpg, .jpeg, .png), executables (.exe), and compressed files (.gz, .tgz, .zip, .gz2, .sit, .rar). Everything else gets compressed.

Verify Your Compression

Once you've configured your server, check to make sure you're actually serving up compressed content.
  • Online: Use the online gzip test to check whether your page is compressed.
  • In your browser: Use Web Developer Toolbar > Information > View Document Size (like I did for Yahoo, above) to see whether the page is compressed.
  • View the headers: Use Live HTTP Headers to examine the response. Look for a line that says "Content-encoding: gzip".

Compressed CSS Compression using PHP

Method One

Overview: This method involves adding a small PHP script to your CSS document and replacing its.css extension with a .php extension.
Place the following PHP script into the top of the CSS document that you wish to compress. Then change the .css extension to .php, to arrive at something similar to: compressed-css.php. Remember to use the new name when referencing the file.
<?php 
   ob_start ("ob_gzhandler");
   header ("content-type: text/css; charset: UTF-8");
   header ("cache-control: must-revalidate");
   $offset = 60 * 60;
   $expire = "expires: " . gmdate ("D, d M Y H:i:s", time() + $offset) . " GMT";
   header ($expire);
?>
Here is the same PHP script commented with functional explanations:
<?php

   // initialize ob_gzhandler function to send and compress data
   ob_start ("ob_gzhandler");

   // send the requisite header information and character set
   header ("content-type: text/css; charset: UTF-8");

   // check cached credentials and reprocess accordingly
   header ("cache-control: must-revalidate");

   // set variable for duration of cached content
   $offset = 60 * 60;

   // set variable specifying format of expiration header
   $expire = "expires: " . gmdate ("D, d M Y H:i:s", time() + $offset) . " GMT";

   // send cache expiration header to the client broswer
   header ($expire); 
  
   // output css contents
   readfile("some_location/some_name.css"); 
?>
Functional Summary: The previous PHP function will first check to see if the browser requesting the file will accept "gzip-deflate" encoding. If no such support is detected, the requested file is sent without compression. Next, the function sends a header for the content type and character set (in this case, "text/css" and "UTF-8"). Then, a "must-revalidate" "cache-control" header requires revalidation against currently specified variables. Finally, an "expires" header specifies the time duration for which the cached content should persist (one hour in this case).

Method Two

Overview: This method involves placing the PHP script in a separate .php file and adding a set of rules to an .htaccess file.
A more discrete, unobtrusive method for compressing CSS involves two steps. First, save the script provided in the first method (above) as a seperate gzip-css.php file and place it in a CSS-exclusive directory. Then, add the following ruleset to an .htaccess file located in the same CSS-exclusive directory (i.e., the CSS directory should contain only CSS files):
# css compression htaccess ruleset
AddHandler application/x-httpd-php .css
php_value auto_prepend_file gzip-css.php
php_flag zlib.output_compression On
Here is the same htaccess ruleset commented with functional explanations:
# css compression htaccess ruleset

# process all CSS files in current directory as PHP
AddHandler application/x-httpd-php .css

# prepend the PHP script to all PHP files in the current directory
php_value auto_prepend_file gzip-css.php

# compress all parsed PHP pages from current directory
# this rule is redundantly present as the first line of the PHP script
php_flag zlib.output_compression On
Functional Summary: The .htaccess rules above first instruct Apache to parse all CSS files in the current directory as PHP. After this, Apache is instructed to insert the contents of the "gzip-css.php" file into the beginning of each PHP (i.e., CSS) file parsed from the current directory. And finally, Apache is instructed to compress automatically every parsed document in the current directory.

Confirmed Browsers

  • Internet Explorer 5 and up: works great
  • Netscape Navigator 6 and up: works fine
  • Mozilla/Firefox: all versions seem to work
  • Opera: does not cache compressed CSS

References

Leverage browser caching: How To Control Browser Caching with Apache 2 or htaccess

Examining Your HTTP Headers - The other way of finding out if your site is taking advantage of browser caching is to examine the HTTP headers on any given server response, be it a whole page or a single file. You can do this using the Live HTTP Headers Chrome mentioned above. A common HTTP header response will look something like this.
HTTP/1.1 200 OK
Date: Sun, 19 Feb 2006 16:42:05 GMT
Server: Apache/2.0.55 (Unix) DAV/2 PHP/5.1.1
Last-Modified: Sun, 08 Jan 2006 16:17:25 GMT
ETag: "73049-defc-37c52f40"
Accept-Ranges: bytes
Content-Length: 57084
Cache-Control: max-age=7200
Expires: Sun, 19 Feb 2006 18:42:05 GMT
Connection: close
Content-Type: image/png

What we are looking for in the header are the "Cache-Control" and "Expires" fields. These fields control how long the browser will cache this media or page asset from your server. Having a low value like "1" in "Cache-Control" can be just as bad as no value at all.

If you have those fields present, you've got a head start on the situation. If not, you may need to make sure Apache 2 is loading the mod_expires and mod_headers modules. Most installations and builds of Apache 2 include these modules since they are pretty essential. If you are using the Apache 2 in OS X Server's opt directory, this is the case and you only need to make sure that they are turned on by opening the http.conf file and making sure the following two lines do not have an # sign in front of them.
LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so

When these modules are loaded and working, we can then start to use the correct Apache 2 directives to control how browsers cache everything from pages to images across the whole site or in a specific directory.

Using Apache 2 To Control Browser Caching

Now that we are here, I can assume that you have both the mod_expires and mod_headers modules compiled and loaded into your Apache 2 installation. I can also assume that you have examined a few headers from your server's responses and determined that the "Cache-Control" and "Expires" fields are either not set or they are configured at such a low value to be ineffective. Lastly, you may have bypassed some header examinations and just determined that browser caching is not happening by tailing your Apache 2 log file and monitoring redundant requests. Either way, let's get to fixing Apache 2 to control browser caching that is right for you and your particular site.


Here is a code snippet of an Apache 2 directive that we will be using. This directive can be modified to suite your tastes or it can just be used "as is" for most users. The directives here can be placed into the <Directory> directive of your virtual host in http.conf or it can be placed loosely in a .htaccess file in the root of your website.

<IfModule mod_expires.c>
ExpiresActive On
ExpiresDefault "access plus 1 seconds"
ExpiresByType text/html "access plus 1 seconds"
ExpiresByType image/gif "access plus 120 minutes"
ExpiresByType image/jpeg "access plus 120 minutes"
ExpiresByType image/png "access plus 120 minutes"
ExpiresByType text/css "access plus 60 minutes"
ExpiresByType text/javascript "access plus 60 minutes"
ExpiresByType application/x-javascript "access plus 60 minutes"
ExpiresByType text/xml "access plus 60 minutes"
</IfModule>


Let's examine what is happening here in brief. If you are interested full documentation of the expires directive can be found on Apache's website along with different syntax formats than the ones used here. I like this format since it is inheritably legible. This directive will do the following:
  • Set the default expiration of content in the browser cache to 1 second past the time of accessing that content. This is good for setting a catchall or default if you fail to explicitly define a content type in the following directives.
  • Set the expiration of text/html pages to 1 second. My content management system Drupal does this already in its .htaccess file, but I include it here if you wish to change it. I think this is a good setting since technically most html pages are small and I like to err on the side of caution and always want my page content to be fresh. For instance, I may make changes to my global template and want it to be visible immediately.
  • Set the expiration of standard images like GIFF, JPEG, and PNG to 2 hours.
  • Set the expiration of CSS and JavaScript to 1 hour.
  • Set the expiration of XML files such as RSS feeds to 1 hour.
I usually use rules like this in htaccess, although you can customize them for how you want.
# 1 YEAR
ExpiresActive On
<FilesMatch "\.(otf|ico|pdf|flv)$">
Header set Cache-Control "max-age=29030400, public"
ExpiresDefault "access plus 1 years"
Header unset Last-Modified
Header unset ETag
SetOutputFilter DEFLATE
</FilesMatch>

# 1 MONTHS
<FilesMatch "\.(jpg|jpeg|png|gif|swf)$">
Header set Cache-Control "max-age=2419200, public"
ExpiresDefault "access plus 1 month"
SetOutputFilter DEFLATE
</FilesMatch>

<FilesMatch "\.(xml|txt|css|js)$">
Header set Cache-Control "max-age=604800, public"
ExpiresDefault "access plus 1 week"
SetOutputFilter DEFLATE
</FilesMatch>

# 30 MIN
<FilesMatch "\.(html|htm|php)$">
SetOutputFilter DEFLATE
</FilesMatch>

The entire .htaccess file

Let's take a look at the entire htaccess config file, then go through all the configuration options.
Header unset Pragma
FileETag None
Header unset ETag
 
# cache images/pdf docs for 10 days
<FilesMatch "\.(ico|pdf|jpg|jpeg|png|gif)$">
  Header set Cache-Control "max-age=864000, public, must-revalidate"
  Header unset Last-Modified
</FilesMatch>
 
# cache html/htm/xml/txt diles for 2 days
<FilesMatch "\.(html|htm|xml|txt|xsl)$">
  Header set Cache-Control "max-age=7200, must-revalidate"
</FilesMatch>

Disabiling some of the headers

First, we unset the Pragma value from the server response headers.
Header unset Pragma
This will unset any headers that are created, by, for instance, a php script interpreted during the request. This is needed because browsers may interpret "Pragma: no-cache" as "This content is not to be cached". Then, we turn off the ETags:
FileETag None
Header unset ETag
ETags are a mechanism to determine the origin of the content on the server (such as inode location), providing the browser information to cache objects depending on where they are on the disk. Disabling them not only makes the server work faster, but also allows the browser to rely on the Cache-Control headers, which we will describe next...

Adding static content cache headers

Next we will enable caching for different type of files.
For image files and pdf documents, in this example we set the cache to 10 days (that is 864000 seconds):
<FilesMatch "\.(ico|pdf|jpg|jpeg|png|gif)$">
  Header set Cache-Control "max-age=864000, public, must-revalidate"
  Header unset Last-Modified
</FilesMatch>
The "max-age" value indicates the time difference (in seconds) after which the content will be expired and reloaded from the server. The "public" keyword presence indicates that any system along the route may cache the response. The "must-revalidate" indicates caching systems to obey other header information you may provide at a later time about the cache. This should help preventing stale caching (that is, caching that delivers content that is outdated).
You will notice that we also unset the Last-Modified header. Why are we doing this? Because by eliminating the Last-Modified and ETags headers, you are eliminating validation requests, leading to a decreased response time. This should work fine in most cases when dealing with static, rarely updated content.

Caching files that change more often

<FilesMatch "\.(html|htm|xml|txt|xsl)$">
  Header set Cache-Control "max-age=7200, must-revalidate"
</FilesMatch>
For html, htm, xml, txt, xsl files that we expect are more likely to change, we won't eliminate the Last-Modified header, thus allowing cache systems to verify the last modification date of the document. In case the http document is fresher then the local cached version, the cache will be invalidated.

If you have multiple file types that should expire after the same time after they have been accessed (let's say in one week), you can use a combination of the FilesMatchand the ExpiresDefault directives, e.g. as follows:
[...]
<IfModule mod_expires.c>
          <FilesMatch "\.(jpe?g|png|gif|js|css)$">
                      ExpiresActive On
                      ExpiresDefault "access plus 1 week"
          </FilesMatch>
</IfModule>
[...]
This would tell browsers to cache .jpg, .jpeg, .png, .gif, .js, and .css files for one week.

Instead of using FilesMatch and ExpiresDefault directives, you could also use the ExpiresByType directice and set an Expires header (plus the max-age directive of theCache-Control HTTP header) individually for each file type, e.g. as follows:
[...]
<IfModule mod_expires.c>
          ExpiresActive on

          ExpiresByType image/jpg "access plus 60 days"
          ExpiresByType image/png "access plus 60 days"
          ExpiresByType image/gif "access plus 60 days"
          ExpiresByType image/jpeg "access plus 60 days"

          ExpiresByType text/css "access plus 1 days"

          ExpiresByType image/x-icon "access plus 1 month"

          ExpiresByType application/pdf "access plus 1 month"
          ExpiresByType audio/x-wav "access plus 1 month"
          ExpiresByType audio/mpeg "access plus 1 month"
          ExpiresByType video/mpeg "access plus 1 month"
          ExpiresByType video/mp4 "access plus 1 month"
          ExpiresByType video/quicktime "access plus 1 month"
          ExpiresByType video/x-ms-wmv "access plus 1 month"
          ExpiresByType application/x-shockwave-flash "access 1 month"

          ExpiresByType text/javascript "access plus 1 week"
          ExpiresByType application/x-javascript "access plus 1 week"
          ExpiresByType application/javascript "access plus 1 week"
</IfModule>
[...]
You might have noticed that I've set three ExpiresByType directives for Javascript files - that is because Javascript files might have different file types on each server. If you set just one directive for text/javascript, but the server recognizes the Javascript file as application/javascript, then it will not be covered by your configuration, and no cache headers will be set.
You can use the following time units in your configuration:
  • years
  • months
  • weeks
  • days
  • hours
  • minutes
  • seconds
Please note that Apache accepts these time units in both singular and plural, so you can use day and days, week and weeks, etc.
It is possible to combine multiple time units, e.g. as follows:
ExpiresByType text/html "access plus 1 month 15 days 2 hours"

What to add in your .htaccess file

Open your .htaccess file. (be smart: make a copy of your original .htaccess file, in case you accidentally make a mistake and need to revert)
Now it’s time to enable the Expires headers module in Apache (set the ‘ExpiresActive’ to ‘On’), so add this to your .htaccess file:
<IfModule mod_expires.c>

# Enable expirations
ExpiresActive On 

</IfModule>
It might be useful to add a “Default directive” for a default expiry date, so that’s the 2 rows you’ll add now:
<IfModule mod_expires.c>

# Enable expirations
ExpiresActive On 

# Default directive
ExpiresDefault "access plus 1 month"

</IfModule>
That’s the base. Now add all the lines for each of your file types (you know, the ones you created earlier for your favicon, images, css and javascript). You’ll end up with a code snippet that looks something like this:
<IfModule mod_expires.c>

# Enable expirations
ExpiresActive On

# Default directive
ExpiresDefault "access plus 1 month"

# My favicon
ExpiresByType image/x-icon "access plus 1 year”

# Images
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"

# CSS
ExpiresByType text/css "access 1 month”

# Javascript
ExpiresByType application/javascript "access plus 1 year"

</IfModule>
That’s it.
References
If you are interested in knowing ALL about caching including proxy caching and CDNs, please read Mark Nottingham's article "Caching Tutorial for Web Authors and Webmasters". Also included below are links to Apache's website manuals for each of the modules discussed here.
How To Control Browser Caching with Apache 2
Caching Tutorial for Web Authors and Webmasters
Apache Module mod_expires
Apache Module mod_headers