In a large web site, where documents get moved or renamed, it becomes difficult to handle the 404 errors (file not found). Miss-spelling or non uniform updates, it's a though task to avoid completely the problem. Of course you can track these errors in the log of your server, hopefully with a dedicated software but the method we present here allows almost real time notfication of errors and so speeds up the fixing.
Our solution is a combination of .htaccess files and a small perl script that sends the administrator a 404 notification by email as soon as it is encountered by a visitor.
On Apache, the .htaccess files allow to add configuration directives without having to restart the server. They are commonly used to restrict access to private areas but can be used for any other purpose.
Here, we are going to use the directive that allows customisation of error messages :
ErrorDocument 404 /cgi-bin/e404.cgi
Instead of redirecting to a static document when a 404 error occurs, we execute a defined perl script.
This perl script will send a message to the webmaster to tell him where and how the 404 error occured. To do so, we first capture two environment variables, HTTP_REFERER that is the address of the document that links to the missing one, and REQUEST_URI that contains it's address.
#!/usr/bin/perl
# All by HAbeTT
# Fucking off the 404
$refer = $ENV{'HTTP_REFERER'};
$locat = $ENV{'REQUEST_URI'};
Then we use a pipe on sendmail to report the error to the webmaster.
open(SENDMAIL, "|/usr/lib/sendmail -t") or die "Can't fork for sendmail: $!\n"; print SENDMAIL <<EOM; From: e404\@habett.org To: webmaster\@habett.org Subject: Error 404 We have spotted a 404 error on our site. This link is broken: from $refer to $locat Please make corrections asap. EOM close(SENDMAIL) or warn "Sendmail didn't close nicely";
Now we have to redirect the visitor to a static page stating that there is a 404, file not found, error and that the webmaster has been notified and will repair the broken link soon.
print "Location: e404.html\n\n"; exit (0);
On our site, we use a slightly different strategy with two steps. First we consider the error and build an hidden form with the related informations (referrer and target) and we ask the visitor to validate the form in order to alert the webmaster. This strategy is more selective and smart (avoid bots reports) but still easy to implement.
Another evolution could be to make a distinction between calls to a document not found, those from the inside (that can be dealt with) and those from the outside.