Handling 404 errors under Apache with perl

Principle

In a large web site, where documents get moved or renamed, it becomes difficult to handle the 404 errors (file not found). Miss-spelling or non uniform updates, it's a though task to avoid completely the problem. Of course you can track these errors in the log of your server, hopefully with a dedicated software but the method we present here allows almost real time notfication of errors and so speeds up the fixing.

Strategy

Our solution is a combination of .htaccess files and a small perl script that sends the administrator a 404 notification by email as soon as it is encountered by a visitor.

.htaccess

On Apache, the .htaccess files allow to add configuration directives without having to restart the server. They are commonly used to restrict access to private areas but can be used for any other purpose.

Here, we are going to use the directive that allows customisation of error messages :

ErrorDocument 404 /cgi-bin/e404.cgi

Instead of redirecting to a static document when a 404 error occurs, we execute a defined perl script.

The e404.cgi perl script

This perl script will send a message to the webmaster to tell him where and how the 404 error occured. To do so, we first capture two environment variables, HTTP_REFERER that is the address of the document that links to the missing one, and REQUEST_URI that contains it's address.

#!/usr/bin/perl
# All by HAbeTT
# Fucking off the 404

$refer = $ENV{'HTTP_REFERER'};
$locat = $ENV{'REQUEST_URI'};

Then we use a pipe on sendmail to report the error to the webmaster.

open(SENDMAIL, "|/usr/lib/sendmail -t") or die "Can't fork for sendmail: $!\n";
print SENDMAIL <<EOM;
From: e404\@habett.org
To: webmaster\@habett.org
Subject: Error 404

We have spotted a 404 error on our site. This link is broken:

from
  $refer

to
  $locat

Please make corrections asap.

EOM
close(SENDMAIL) or warn "Sendmail didn't close nicely";

Now we have to redirect the visitor to a static page stating that there is a 404, file not found, error and that the webmaster has been notified and will repair the broken link soon.

print "Location: e404.html\n\n";
exit (0);

Alternatives

On our site, we use a slightly different strategy with two steps. First we consider the error and build an hidden form with the related informations (referrer and target) and we ask the visitor to validate the form in order to alert the webmaster. This strategy is more selective and smart (avoid bots reports) but still easy to implement.

Another evolution could be to make a distinction between calls to a document not found, those from the inside (that can be dealt with) and those from the outside.

main menu