Showing posts with label apache. Show all posts
Showing posts with label apache. Show all posts

Friday, September 6, 2013

Speed up a web site by enabling Apache file gzip compression

Setting up the server

The "good news" is that we can't control the browser. It either sends the Accept-encoding: gzip, deflate header or it doesn't.
Our job is to configure the server so it returns zipped content if the browser can handle it, saving bandwidth for everyone (and giving us a happy user).
For IIS, enable compression in the settings.
In Apache, enabling output compression is fairly straightforward. Add the following to your .htaccess file:

# compress text, html, javascript, css, xml:
AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript

# Or, compress certain file types by extension:
<files *.html>
SetOutputFilter DEFLATE
</files>

Apache actually has two compression options:
  • mod_deflate is easier to set up and is standard.
  • mod_gzip seems more powerful: you can pre-compress content.
Deflate is quick and works, so I use it; use mod_gzip if that floats your boat. In either case, Apache checks if the browser sent the "Accept-encoding" header and returns the compressed or regular version of the file. However, some older browsers may have trouble (more below) and there are special directives you can add to correct this.
If you can't change your .htaccess file, you can use PHP to return compressed content. Give your HTML file a .php extension and add this code to the top:
In PHP:
<?php if (substr_count($_SERVER['HTTP_ACCEPT_ENCODING'], 'gzip')) ob_start("ob_gzhandler"); else ob_start(); ?>
We check the "Accept-encoding" header and return a gzipped version of the file (otherwise the regular version). This is almost like building your own webserver (what fun!). But really, try to use Apache to compress your output if you can help it. You don't want to monkey with your files.

How to enable file compression

Apache 1.x and 2.x can automatically compress files, but neither one comes with a compressor enabled by default. Enabling compression reduces CSS, HTML, and JavaScript file sizes by 55-65% and speeds up overall page load times by 35-40%.
Apache uses plug-in modules to add functionality. For Apache 1.x, use the free mod_gzip module to compress files. For Apache 2.x, use mod_gzip or the built-in mod_deflatemodule.
Enable file compression using mod_gzip
The mod_gzip module can be used with Apache 1.x or 2.x, but it doesn’t come with either Apache distribution. You’ll need to download and install it separately.
  1. Windows:
    1. Log in to your PC using an account with administrator privileges.
    2. Download the zip file containing ApacheModuleGzip.dll from SourceForge.
    3. Unzip the file.
    4. Move ApacheModuleGzip.dll to your Apache modules folder (typically “c:\Program Files\Apache Group\Apache\modules”).
    5. Edit your server configuration file using a text editor like NotePad (typically “c:\Program Files\Apache Group\Apache\conf\httpd.conf”). Add the following line to your server configuration file as the last loaded module:
      LoadModule gzip_module modules/ApacheModuleGzip.dll
      Add the following lines to your server configuration file or to a site’s “.htaccess” file:
      <IfModule mod_gzip.c>
          mod_gzip_on       Yes
          mod_gzip_dechunk  Yes
          mod_gzip_item_include file      \.(html?|txt|css|js|php|pl)$
          mod_gzip_item_include handler   ^cgi-script$
          mod_gzip_item_include mime      ^text/.*
          mod_gzip_item_include mime      ^application/x-javascript.*
          mod_gzip_item_exclude mime      ^image/.*
          mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
      </IfModule>
  2. Mac:
    1. Log in to your Mac using an account with administrator privileges.
    2. Download the zip file containing the module’s C source code from SourceForge.
    3. Unzip the file.
    4. Compile the module using the included instructions.
    5. Move mod_gzip.so to your Apache modules folder (typically “/usr/libexec/httpd”).
    6. Edit your server configuration file using a text editor like TextEdit or vim (typically “/etc/httpd/httpd.conf”). Add the following line to your server configuration file as the last loaded module:
      LoadModule gzip_module libexec/mod_gzip.so
      Add the following lines to your server configuration file or to a site’s “.htaccess” file:
      <IfModule mod_gzip.c>
          mod_gzip_on       Yes
          mod_gzip_dechunk  Yes
          mod_gzip_item_include file      \.(html?|txt|css|js|php|pl)$
          mod_gzip_item_include handler   ^cgi-script$
          mod_gzip_item_include mime      ^text/.*
          mod_gzip_item_include mime      ^application/x-javascript.*
          mod_gzip_item_exclude mime      ^image/.*
          mod_gzip_item_exclude rspheader ^Content-Encoding:.*gzip.*
      </IfModule>
                          
  3. Restart Apache.
The “LoadModule” line in the configuration file makes the module ready, while the other lines configure and enable it. Put these other lines in the server’s configuration file to affect all sites served by the web server. Or put them within a site’s “VirtualHost” block or in its own “.htaccess” file to affect only that site.
The remaining lines tell the module to compress files with .htm, .html, .txt, .css, .js, .php, and .pl file name extensions, the output of CGI scripts, and any output that is text or JavaScript, but not images. The last line tells the module to skip compressing content that is already compressed.

Enable file compression using mod_deflate

The mod_deflate module comes with Apache 2.x. All you need to do is enable it.
  1. Windows:
    1. Log in to your PC using an account with administrator privileges.
    2. Edit your server configuration file using a text editor like NotePad (typically “c:\Program Files\Apache Group\Apache\conf\httpd.conf”). Add the following lines to your server configuration file or to a site’s “.htaccess” file:
      <Location />
          SetOutputFilter DEFLATE
            SetEnvIfNoCase Request_URI  \
              \.(?:gif|jpe?g|png)$ no-gzip dont-vary
          SetEnvIfNoCase Request_URI  \
              \.(?:exe|t?gz|zip|gz2|sit|rar)$ no-gzip dont-vary
      </Location> 
  2. Mac:
    1. Log in to your Mac using an account with administrator privileges.
    2. Edit your server configuration file using a text editor like TextEdit or vim (typically “/etc/httpd/httpd.conf”). Add the following lines to your server configuration file or to a site’s “.htaccess” file:
      <Location />
          SetOutputFilter DEFLATE
          SetEnvIfNoCase Request_URI  \
              \.(?:gif|jpe?g|png)$ no-gzip dont-vary
          SetEnvIfNoCase Request_URI  \
              \.(?:exe|t?gz|zip|gz2|sit|rar)$ no-gzip dont-vary
      </Location> 
  3. Restart Apache.
Put the configuration lines in the server’s configuration file to affect all sites served by the web server. Or put them within a site’s “VirtualHost” block or in its “.htaccess” file to affect only that site.
The “SetOutputFilter” line enables the module.
The next two lines instruct the module to skip compressing image files (.gif, .jpg, .jpeg, .png), executables (.exe), and compressed files (.gz, .tgz, .zip, .gz2, .sit, .rar). Everything else gets compressed.

Verify Your Compression

Once you've configured your server, check to make sure you're actually serving up compressed content.
  • Online: Use the online gzip test to check whether your page is compressed.
  • In your browser: Use Web Developer Toolbar > Information > View Document Size (like I did for Yahoo, above) to see whether the page is compressed.
  • View the headers: Use Live HTTP Headers to examine the response. Look for a line that says "Content-encoding: gzip".

Leverage browser caching: How To Control Browser Caching with Apache 2 or htaccess

Examining Your HTTP Headers - The other way of finding out if your site is taking advantage of browser caching is to examine the HTTP headers on any given server response, be it a whole page or a single file. You can do this using the Live HTTP Headers Chrome mentioned above. A common HTTP header response will look something like this.
HTTP/1.1 200 OK
Date: Sun, 19 Feb 2006 16:42:05 GMT
Server: Apache/2.0.55 (Unix) DAV/2 PHP/5.1.1
Last-Modified: Sun, 08 Jan 2006 16:17:25 GMT
ETag: "73049-defc-37c52f40"
Accept-Ranges: bytes
Content-Length: 57084
Cache-Control: max-age=7200
Expires: Sun, 19 Feb 2006 18:42:05 GMT
Connection: close
Content-Type: image/png

What we are looking for in the header are the "Cache-Control" and "Expires" fields. These fields control how long the browser will cache this media or page asset from your server. Having a low value like "1" in "Cache-Control" can be just as bad as no value at all.

If you have those fields present, you've got a head start on the situation. If not, you may need to make sure Apache 2 is loading the mod_expires and mod_headers modules. Most installations and builds of Apache 2 include these modules since they are pretty essential. If you are using the Apache 2 in OS X Server's opt directory, this is the case and you only need to make sure that they are turned on by opening the http.conf file and making sure the following two lines do not have an # sign in front of them.
LoadModule expires_module modules/mod_expires.so
LoadModule headers_module modules/mod_headers.so

When these modules are loaded and working, we can then start to use the correct Apache 2 directives to control how browsers cache everything from pages to images across the whole site or in a specific directory.

Using Apache 2 To Control Browser Caching

Now that we are here, I can assume that you have both the mod_expires and mod_headers modules compiled and loaded into your Apache 2 installation. I can also assume that you have examined a few headers from your server's responses and determined that the "Cache-Control" and "Expires" fields are either not set or they are configured at such a low value to be ineffective. Lastly, you may have bypassed some header examinations and just determined that browser caching is not happening by tailing your Apache 2 log file and monitoring redundant requests. Either way, let's get to fixing Apache 2 to control browser caching that is right for you and your particular site.


Here is a code snippet of an Apache 2 directive that we will be using. This directive can be modified to suite your tastes or it can just be used "as is" for most users. The directives here can be placed into the <Directory> directive of your virtual host in http.conf or it can be placed loosely in a .htaccess file in the root of your website.

<IfModule mod_expires.c>
ExpiresActive On
ExpiresDefault "access plus 1 seconds"
ExpiresByType text/html "access plus 1 seconds"
ExpiresByType image/gif "access plus 120 minutes"
ExpiresByType image/jpeg "access plus 120 minutes"
ExpiresByType image/png "access plus 120 minutes"
ExpiresByType text/css "access plus 60 minutes"
ExpiresByType text/javascript "access plus 60 minutes"
ExpiresByType application/x-javascript "access plus 60 minutes"
ExpiresByType text/xml "access plus 60 minutes"
</IfModule>


Let's examine what is happening here in brief. If you are interested full documentation of the expires directive can be found on Apache's website along with different syntax formats than the ones used here. I like this format since it is inheritably legible. This directive will do the following:
  • Set the default expiration of content in the browser cache to 1 second past the time of accessing that content. This is good for setting a catchall or default if you fail to explicitly define a content type in the following directives.
  • Set the expiration of text/html pages to 1 second. My content management system Drupal does this already in its .htaccess file, but I include it here if you wish to change it. I think this is a good setting since technically most html pages are small and I like to err on the side of caution and always want my page content to be fresh. For instance, I may make changes to my global template and want it to be visible immediately.
  • Set the expiration of standard images like GIFF, JPEG, and PNG to 2 hours.
  • Set the expiration of CSS and JavaScript to 1 hour.
  • Set the expiration of XML files such as RSS feeds to 1 hour.
I usually use rules like this in htaccess, although you can customize them for how you want.
# 1 YEAR
ExpiresActive On
<FilesMatch "\.(otf|ico|pdf|flv)$">
Header set Cache-Control "max-age=29030400, public"
ExpiresDefault "access plus 1 years"
Header unset Last-Modified
Header unset ETag
SetOutputFilter DEFLATE
</FilesMatch>

# 1 MONTHS
<FilesMatch "\.(jpg|jpeg|png|gif|swf)$">
Header set Cache-Control "max-age=2419200, public"
ExpiresDefault "access plus 1 month"
SetOutputFilter DEFLATE
</FilesMatch>

<FilesMatch "\.(xml|txt|css|js)$">
Header set Cache-Control "max-age=604800, public"
ExpiresDefault "access plus 1 week"
SetOutputFilter DEFLATE
</FilesMatch>

# 30 MIN
<FilesMatch "\.(html|htm|php)$">
SetOutputFilter DEFLATE
</FilesMatch>

The entire .htaccess file

Let's take a look at the entire htaccess config file, then go through all the configuration options.
Header unset Pragma
FileETag None
Header unset ETag
 
# cache images/pdf docs for 10 days
<FilesMatch "\.(ico|pdf|jpg|jpeg|png|gif)$">
  Header set Cache-Control "max-age=864000, public, must-revalidate"
  Header unset Last-Modified
</FilesMatch>
 
# cache html/htm/xml/txt diles for 2 days
<FilesMatch "\.(html|htm|xml|txt|xsl)$">
  Header set Cache-Control "max-age=7200, must-revalidate"
</FilesMatch>

Disabiling some of the headers

First, we unset the Pragma value from the server response headers.
Header unset Pragma
This will unset any headers that are created, by, for instance, a php script interpreted during the request. This is needed because browsers may interpret "Pragma: no-cache" as "This content is not to be cached". Then, we turn off the ETags:
FileETag None
Header unset ETag
ETags are a mechanism to determine the origin of the content on the server (such as inode location), providing the browser information to cache objects depending on where they are on the disk. Disabling them not only makes the server work faster, but also allows the browser to rely on the Cache-Control headers, which we will describe next...

Adding static content cache headers

Next we will enable caching for different type of files.
For image files and pdf documents, in this example we set the cache to 10 days (that is 864000 seconds):
<FilesMatch "\.(ico|pdf|jpg|jpeg|png|gif)$">
  Header set Cache-Control "max-age=864000, public, must-revalidate"
  Header unset Last-Modified
</FilesMatch>
The "max-age" value indicates the time difference (in seconds) after which the content will be expired and reloaded from the server. The "public" keyword presence indicates that any system along the route may cache the response. The "must-revalidate" indicates caching systems to obey other header information you may provide at a later time about the cache. This should help preventing stale caching (that is, caching that delivers content that is outdated).
You will notice that we also unset the Last-Modified header. Why are we doing this? Because by eliminating the Last-Modified and ETags headers, you are eliminating validation requests, leading to a decreased response time. This should work fine in most cases when dealing with static, rarely updated content.

Caching files that change more often

<FilesMatch "\.(html|htm|xml|txt|xsl)$">
  Header set Cache-Control "max-age=7200, must-revalidate"
</FilesMatch>
For html, htm, xml, txt, xsl files that we expect are more likely to change, we won't eliminate the Last-Modified header, thus allowing cache systems to verify the last modification date of the document. In case the http document is fresher then the local cached version, the cache will be invalidated.

If you have multiple file types that should expire after the same time after they have been accessed (let's say in one week), you can use a combination of the FilesMatchand the ExpiresDefault directives, e.g. as follows:
[...]
<IfModule mod_expires.c>
          <FilesMatch "\.(jpe?g|png|gif|js|css)$">
                      ExpiresActive On
                      ExpiresDefault "access plus 1 week"
          </FilesMatch>
</IfModule>
[...]
This would tell browsers to cache .jpg, .jpeg, .png, .gif, .js, and .css files for one week.

Instead of using FilesMatch and ExpiresDefault directives, you could also use the ExpiresByType directice and set an Expires header (plus the max-age directive of theCache-Control HTTP header) individually for each file type, e.g. as follows:
[...]
<IfModule mod_expires.c>
          ExpiresActive on

          ExpiresByType image/jpg "access plus 60 days"
          ExpiresByType image/png "access plus 60 days"
          ExpiresByType image/gif "access plus 60 days"
          ExpiresByType image/jpeg "access plus 60 days"

          ExpiresByType text/css "access plus 1 days"

          ExpiresByType image/x-icon "access plus 1 month"

          ExpiresByType application/pdf "access plus 1 month"
          ExpiresByType audio/x-wav "access plus 1 month"
          ExpiresByType audio/mpeg "access plus 1 month"
          ExpiresByType video/mpeg "access plus 1 month"
          ExpiresByType video/mp4 "access plus 1 month"
          ExpiresByType video/quicktime "access plus 1 month"
          ExpiresByType video/x-ms-wmv "access plus 1 month"
          ExpiresByType application/x-shockwave-flash "access 1 month"

          ExpiresByType text/javascript "access plus 1 week"
          ExpiresByType application/x-javascript "access plus 1 week"
          ExpiresByType application/javascript "access plus 1 week"
</IfModule>
[...]
You might have noticed that I've set three ExpiresByType directives for Javascript files - that is because Javascript files might have different file types on each server. If you set just one directive for text/javascript, but the server recognizes the Javascript file as application/javascript, then it will not be covered by your configuration, and no cache headers will be set.
You can use the following time units in your configuration:
  • years
  • months
  • weeks
  • days
  • hours
  • minutes
  • seconds
Please note that Apache accepts these time units in both singular and plural, so you can use day and days, week and weeks, etc.
It is possible to combine multiple time units, e.g. as follows:
ExpiresByType text/html "access plus 1 month 15 days 2 hours"

What to add in your .htaccess file

Open your .htaccess file. (be smart: make a copy of your original .htaccess file, in case you accidentally make a mistake and need to revert)
Now it’s time to enable the Expires headers module in Apache (set the ‘ExpiresActive’ to ‘On’), so add this to your .htaccess file:
<IfModule mod_expires.c>

# Enable expirations
ExpiresActive On 

</IfModule>
It might be useful to add a “Default directive” for a default expiry date, so that’s the 2 rows you’ll add now:
<IfModule mod_expires.c>

# Enable expirations
ExpiresActive On 

# Default directive
ExpiresDefault "access plus 1 month"

</IfModule>
That’s the base. Now add all the lines for each of your file types (you know, the ones you created earlier for your favicon, images, css and javascript). You’ll end up with a code snippet that looks something like this:
<IfModule mod_expires.c>

# Enable expirations
ExpiresActive On

# Default directive
ExpiresDefault "access plus 1 month"

# My favicon
ExpiresByType image/x-icon "access plus 1 year”

# Images
ExpiresByType image/gif "access plus 1 month"
ExpiresByType image/png "access plus 1 month"
ExpiresByType image/jpg "access plus 1 month"
ExpiresByType image/jpeg "access plus 1 month"

# CSS
ExpiresByType text/css "access 1 month”

# Javascript
ExpiresByType application/javascript "access plus 1 year"

</IfModule>
That’s it.
References
If you are interested in knowing ALL about caching including proxy caching and CDNs, please read Mark Nottingham's article "Caching Tutorial for Web Authors and Webmasters". Also included below are links to Apache's website manuals for each of the modules discussed here.
How To Control Browser Caching with Apache 2
Caching Tutorial for Web Authors and Webmasters
Apache Module mod_expires
Apache Module mod_headers

PHP image output and browser caching

The code is pretty simple, and borrows heavily from code pasted by mandor at mandor dot net for the PHP header function. We generalised it for different graphic file types and a second function so it works when PHP is an Apache module or cgi.
The first function displayGraphicFile is to return a graphic file. The function does assume the file exists.
// Return the requested graphic file to the browser
// or a 304 code to use the cached browser copy
function displayGraphicFile ($graphicFileName, $fileType='jpeg') {
  $fileModTime = filemtime($graphicFileName);
  // Getting headers sent by the client.
  $headers = getRequestHeaders();
  // Checking if the client is validating his cache and if it is current.
  if (isset($headers['If-Modified-Since']) && (strtotime($headers['If-Modified-Since']) == $fileModTime)) {

    // Client's cache IS current, so we just respond '304 Not Modified'.
    header('Last-Modified: '.gmdate('D, d M Y H:i:s', $fileModTime).' GMT', true, 304);
  } else {
    // Image not cached or cache outdated, we respond '200 OK' and output the image.
    header('Last-Modified: '.gmdate('D, d M Y H:i:s', $fileModTime).' GMT', true, 200);
    header('Content-type: image/'.$fileType);
    header('Content-transfer-encoding: binary');
    header('Content-length: '.filesize($graphicFileName));
    readfile($graphicFileName);
  }
}
The second function to get the header request details. We specifically require the ‘If-Modified-Since’ header.
// return the browser request header
// use built in apache ftn when PHP built as module,
// or query $_SERVER when cgi
function getRequestHeaders() {
  if (function_exists("apache_request_headers")) {
    if($headers = apache_request_headers()) {
      return $headers;

    }
  }
  $headers = array();
  // Grab the IF_MODIFIED_SINCE header
  if (isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
    $headers['If-Modified-Since'] = $_SERVER['HTTP_IF_MODIFIED_SINCE'];
  }
  return $headers;
}

Apache Access Log Rotation In AppServ And Xampp

Apache access log rotation in AppServ and Xampp is disable by default. By default,access log will be created in directory path “xampp/apache/logs/access.log” for Xampp and “AppServ/ApacheX.X/logs/access.log” for AppServ. On a busy server, access log will be grows very fast equal to high traffic for apache server. Without rotation, access.log will be very large. Manual rotation by periodically backup existing log cannot be done while apache server is running. In this case, apache continues writing to access.log until the server is stopped.
Small file log will be lighten work load of apache server, because it’s need more resource to handle writing to a big file. In this tutorial, access.log will be rotate daily base, so the name of log will be “access_year-month-date.log” Ex: access_12-01-01.log for access log at January 1, 2012.

Editing httpd.conf configuration

Edit your httpd.conf in this path “Xampp/apache/conf/httpd.conf” for Xampp and “AppServ/ApacheX.X/conf/httpd.conf” for AppServ. Find the line contains config like this:
1CustomLog "logs/access.log" common
Comment this line by adding # in the beginning of line. Then, add new line bellow old config and fill with this config:
1CustomLog "|bin/rotatelogs.exe logs/access_%y-%m-%d.log 86400" common
Make sure that rotatelogs.exe exist in bin directory. To tell apache should rotate log daily base, we should add numeric value 86400 that means interval in seconds after new file is generated. Number 86400 came from (24 hours * 60 Minutes * 60 Seconds) that means apache log should rotate daily. Don’t forget to savehttpd.conf and restart your apache. After the server up, see at directory path “xampp/apache/logs” for Xampp and “AppServ/ApacheX.X/logs” for AppServ, there is access log with new format name: “access_year-month-date.log”. For more explanation, see Apache HTTP Server Log Files.

Combined Log Format

Another commonly used format string is called the Combined Log Format. It can be used as follows.
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined
CustomLog log/access_log combined
This format is exactly the same as the Common Log Format, with the addition of two more fields. Each of the additional fields uses the percent-directive %{header}i, where header can be any HTTP request header. The access log under this format will look like:
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"
The additional fields are:
"http://www.example.com/start.html" (\"%{Referer}i\")
The "Referer" (sic) HTTP request header. This gives the site that the client reports having been referred from. (This should be the page that links to or includes/apache_pb.gif).
"Mozilla/4.08 [en] (Win98; I ;Nav)" (\"%{User-agent}i\")
The User-Agent HTTP request header. This is the identifying information that the client browser reports about itself.