Friday, September 6, 2013

Leverage browser caching: How To Control Browser Caching with Apache 2 or htaccess

Examining Your HTTP Headers - The other way of finding out if your site is taking advantage of browser caching is to examine the HTTP headers on any given server response, be it a whole page or a single file. You can do this using the Live HTTP Headers Chrome mentioned above. A common HTTP header response will look something like this.
HTTP/1.1 200 OK
Date: Sun, 19 Feb 2006 16:42:05 GMT
Server: Apache/2.0.55 (Unix) DAV/2 PHP/5.1.1
Last-Modified: Sun, 08 Jan 2006 16:17:25 GMT
ETag: "73049-defc-37c52f40"
Accept-Ranges: bytes
Content-Length: 57084
Cache-Control: max-age=7200
Expires: Sun, 19 Feb 2006 18:42:05 GMT
Connection: close
Content-Type: image/png

What we are looking for in the header are the "Cache-Control" and "Expires" fields. These fields control how long the browser will cache this media or page asset from your server. Having a low value like "1" in "Cache-Control" can be just as bad as no value at all.

If you have those fields present, you've got a head start on the situation. If not, you may need to make sure Apache 2 is loading the mod_expires and mod_headers modules. Most installations and builds of Apache 2 include these modules since they are pretty essential. If you are using the Apache 2 in OS X Server's opt directory, this is the case and you only need to make sure that they are turned on by opening the http.conf file and making sure the following two lines do not have an # sign in front of them.
LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so

When these modules are loaded and working, we can then start to use the correct Apache 2 directives to control how browsers cache everything from pages to images across the whole site or in a specific directory.

Using Apache 2 To Control Browser Caching

Now that we are here, I can assume that you have both the mod_expires and mod_headers modules compiled and loaded into your Apache 2 installation. I can also assume that you have examined a few headers from your server's responses and determined that the "Cache-Control" and "Expires" fields are either not set or they are configured at such a low value to be ineffective. Lastly, you may have bypassed some header examinations and just determined that browser caching is not happening by tailing your Apache 2 log file and monitoring redundant requests. Either way, let's get to fixing Apache 2 to control browser caching that is right for you and your particular site.


Here is a code snippet of an Apache 2 directive that we will be using. This directive can be modified to suite your tastes or it can just be used "as is" for most users. The directives here can be placed into the <Directory> directive of your virtual host in http.conf or it can be placed loosely in a .htaccess file in the root of your website.

<IfModule mod_expires.c>
ExpiresActive On
ExpiresDefault "access plus 1 seconds"
ExpiresByType text/html "access plus 1 seconds"
ExpiresByType image/gif "access plus 120 minutes"
ExpiresByType image/jpeg "access plus 120 minutes"
ExpiresByType image/png "access plus 120 minutes"
ExpiresByType text/css "access plus 60 minutes"
ExpiresByType text/javascript "access plus 60 minutes"
ExpiresByType application/x-javascript "access plus 60 minutes"
ExpiresByType text/xml "access plus 60 minutes"
</IfModule>


Let's examine what is happening here in brief. If you are interested full documentation of the expires directive can be found on Apache's website along with different syntax formats than the ones used here. I like this format since it is inheritably legible. This directive will do the following:
  • Set the default expiration of content in the browser cache to 1 second past the time of accessing that content. This is good for setting a catchall or default if you fail to explicitly define a content type in the following directives.
  • Set the expiration of text/html pages to 1 second. My content management system Drupal does this already in its .htaccess file, but I include it here if you wish to change it. I think this is a good setting since technically most html pages are small and I like to err on the side of caution and always want my page content to be fresh. For instance, I may make changes to my global template and want it to be visible immediately.
  • Set the expiration of standard images like GIFF, JPEG, and PNG to 2 hours.
  • Set the expiration of CSS and JavaScript to 1 hour.
  • Set the expiration of XML files such as RSS feeds to 1 hour.
I usually use rules like this in htaccess, although you can customize them for how you want.
# 1 YEAR
ExpiresActive On
<FilesMatch "\.(otf|ico|pdf|flv)$">
Header set Cache-Control "max-age=29030400, public"
ExpiresDefault "access plus 1 years"
Header unset Last-Modified
Header unset ETag
SetOutputFilter DEFLATE
</FilesMatch>

# 1 MONTHS
<FilesMatch "\.(jpg|jpeg|png|gif|swf)$">
Header set Cache-Control "max-age=2419200, public"
ExpiresDefault "access plus 1 month"
SetOutputFilter DEFLATE
</FilesMatch>

<FilesMatch "\.(xml|txt|css|js)$">
Header set Cache-Control "max-age=604800, public"
ExpiresDefault "access plus 1 week"
SetOutputFilter DEFLATE
</FilesMatch>

# 30 MIN
<FilesMatch "\.(html|htm|php)$">
SetOutputFilter DEFLATE
</FilesMatch>

The entire .htaccess file

Let's take a look at the entire htaccess config file, then go through all the configuration options.
Header unset Pragma
FileETag None
Header unset ETag
 
# cache images/pdf docs for 10 days
<FilesMatch "\.(ico|pdf|jpg|jpeg|png|gif)$">
  Header set Cache-Control "max-age=864000, public, must-revalidate"
  Header unset Last-Modified
</FilesMatch>
 
# cache html/htm/xml/txt diles for 2 days
<FilesMatch "\.(html|htm|xml|txt|xsl)$">
  Header set Cache-Control "max-age=7200, must-revalidate"
</FilesMatch>

Disabiling some of the headers

First, we unset the Pragma value from the server response headers.
Header unset Pragma
This will unset any headers that are created, by, for instance, a php script interpreted during the request. This is needed because browsers may interpret "Pragma: no-cache" as "This content is not to be cached". Then, we turn off the ETags:
FileETag None
Header unset ETag
ETags are a mechanism to determine the origin of the content on the server (such as inode location), providing the browser information to cache objects depending on where they are on the disk. Disabling them not only makes the server work faster, but also allows the browser to rely on the Cache-Control headers, which we will describe next...

Adding static content cache headers

Next we will enable caching for different type of files.
For image files and pdf documents, in this example we set the cache to 10 days (that is 864000 seconds):
<FilesMatch "\.(ico|pdf|jpg|jpeg|png|gif)$">
  Header set Cache-Control "max-age=864000, public, must-revalidate"
  Header unset Last-Modified
</FilesMatch>
The "max-age" value indicates the time difference (in seconds) after which the content will be expired and reloaded from the server. The "public" keyword presence indicates that any system along the route may cache the response. The "must-revalidate" indicates caching systems to obey other header information you may provide at a later time about the cache. This should help preventing stale caching (that is, caching that delivers content that is outdated).
You will notice that we also unset the Last-Modified header. Why are we doing this? Because by eliminating the Last-Modified and ETags headers, you are eliminating validation requests, leading to a decreased response time. This should work fine in most cases when dealing with static, rarely updated content.

Caching files that change more often

<FilesMatch "\.(html|htm|xml|txt|xsl)$">
  Header set Cache-Control "max-age=7200, must-revalidate"
</FilesMatch>
For html, htm, xml, txt, xsl files that we expect are more likely to change, we won't eliminate the Last-Modified header, thus allowing cache systems to verify the last modification date of the document. In case the http document is fresher then the local cached version, the cache will be invalidated.

If you have multiple file types that should expire after the same time after they have been accessed (let's say in one week), you can use a combination of the FilesMatchand the ExpiresDefault directives, e.g. as follows:
[...]
<IfModule mod_expires.c>
          <FilesMatch "\.(jpe?g|png|gif|js|css)$">
                      ExpiresActive On
                      ExpiresDefault "access plus 1 week"
          </FilesMatch>
</IfModule>
[...]
This would tell browsers to cache .jpg, .jpeg, .png, .gif, .js, and .css files for one week.

Instead of using FilesMatch and ExpiresDefault directives, you could also use the ExpiresByType directice and set an Expires header (plus the max-age directive of theCache-Control HTTP header) individually for each file type, e.g. as follows:
[...]
<IfModule mod_expires.c>
          ExpiresActive on

          ExpiresByType image/jpg "access plus 60 days"
          ExpiresByType image/png "access plus 60 days"
          ExpiresByType image/gif "access plus 60 days"
          ExpiresByType image/jpeg "access plus 60 days"

          ExpiresByType text/css "access plus 1 days"

          ExpiresByType image/x-icon "access plus 1 month"

          ExpiresByType application/pdf "access plus 1 month"
          ExpiresByType audio/x-wav "access plus 1 month"
          ExpiresByType audio/mpeg "access plus 1 month"
          ExpiresByType video/mpeg "access plus 1 month"
          ExpiresByType video/mp4 "access plus 1 month"
          ExpiresByType video/quicktime "access plus 1 month"
          ExpiresByType video/x-ms-wmv "access plus 1 month"
          ExpiresByType application/x-shockwave-flash "access 1 month"

          ExpiresByType text/javascript "access plus 1 week"
          ExpiresByType application/x-javascript "access plus 1 week"
          ExpiresByType application/javascript "access plus 1 week"
</IfModule>
[...]
You might have noticed that I've set three ExpiresByType directives for Javascript files - that is because Javascript files might have different file types on each server. If you set just one directive for text/javascript, but the server recognizes the Javascript file as application/javascript, then it will not be covered by your configuration, and no cache headers will be set.
You can use the following time units in your configuration:
  • years
  • months
  • weeks
  • days
  • hours
  • minutes
  • seconds
Please note that Apache accepts these time units in both singular and plural, so you can use day and days, week and weeks, etc.
It is possible to combine multiple time units, e.g. as follows:
ExpiresByType text/html "access plus 1 month 15 days 2 hours"

What to add in your .htaccess file

Open your .htaccess file. (be smart: make a copy of your original .htaccess file, in case you accidentally make a mistake and need to revert)
Now it’s time to enable the Expires headers module in Apache (set the ‘ExpiresActive’ to ‘On’), so add this to your .htaccess file:
<IfModule mod_expires.c>

# Enable expirations
ExpiresActive On 

</IfModule>
It might be useful to add a “Default directive” for a default expiry date, so that’s the 2 rows you’ll add now:
<IfModule mod_expires.c>

# Enable expirations
ExpiresActive On 

# Default directive
ExpiresDefault "access plus 1 month"

</IfModule>
That’s the base. Now add all the lines for each of your file types (you know, the ones you created earlier for your favicon, images, css and javascript). You’ll end up with a code snippet that looks something like this:
<IfModule mod_expires.c>

# Enable expirations
ExpiresActive On

# Default directive
ExpiresDefault "access plus 1 month"

# My favicon
ExpiresByType image/x-icon "access plus 1 year”

# Images
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"

# CSS
ExpiresByType text/css "access 1 month”

# Javascript
ExpiresByType application/javascript "access plus 1 year"

</IfModule>
That’s it.
References
If you are interested in knowing ALL about caching including proxy caching and CDNs, please read Mark Nottingham's article "Caching Tutorial for Web Authors and Webmasters". Also included below are links to Apache's website manuals for each of the modules discussed here.
How To Control Browser Caching with Apache 2
Caching Tutorial for Web Authors and Webmasters
Apache Module mod_expires
Apache Module mod_headers

No comments:

Post a Comment