Google Dance monitoring

The need

My name is Stéphane Roux. It's a pretty common first and last name here in France. Too common. The are so many Stéphane Roux on the internet. Even if it's no obsession to me, I like to see if I am one of the most well-known Stéphane Roux on the internet. To do so, you just have to search for me on Google and see if some of my websites turn up. Apparently, I am hopefully the only Stéphane Roux that uses Habett as his alias. Habett shows up in my URLs so I must be able to monitor this easily.

The query http://www.google.com/search?q=stephane%20roux shows the result but because of the way Google works, new site appear, disapper and have their ranking evolving and changing, so the order of the results changes with time. The most relevant site are supposed to appear first according to their Google Rank that is an internal figure in Google that is calculated according to a secret complex formula.

The idea is to have a surveillance programn of my Google placement.

Automating queries on Google

Google is a site as no other, as you know it, and it's policy is to refuse the visit of web spiders robots such as we're used to. Check their robots.txt file for the details. To gain access to Google's content, you must use an API in XML. This isn't the easiest way of dealing with information but the SOAP::Lite perl module makes it pretty straightforward to handle. I have to warn you that this module has many dependancies so that it's installation can be pretty complex.

If you don't use the designated API and that you make direct LWP querries, you might be blacklisted by Google and then you'll be in big trouble that must be avoided at any cost. Access to the API is free and just requires a simple registration. Once you've got your personnal Google developper key and kit, the use of the service is free and you are allowed 1000 queries a day.

The monitoring

Here's my small monitoring script. Insert your developper key inside and plave the GoogleSearch.wsdl from the kit in the same folder.

#!/usr/local/bin/perl -w
# My GoogleDance monitor

$google_key='PASTE YOUR DEVELOPPER KEY HERE';

use SOAP::Lite;

$google_search = SOAP::Lite->service("file:GoogleSearch.wsdl");

# The query
$results = $google_search->doGoogleSearch($google_key, "stephane roux",
  0, 10, "false", "", "false", "", "latin1", "latin1");
@{$results->{resultElements}} or exit;

# Loop through results
$rank=0;
foreach my $result (@{$results->{resultElements}}) {
  $rank++;
  $org = $rank if (($result->{URL} =~ /habett\.org/i) and !($org));
  $com = $rank if (($result->{URL} =~ /habett\.com/i) and !($com));
}

# date calculation for the log
my @date = localtime; 
$today = sprintf "%04d%03d", $date[5]+1900, $date[7];

# write log file
open (LOG,">>dance.log");
print LOG "$today $org $com\n";
close (LOG);
exit(0);

This generates a file containing lines that go something like this :

2004278 4 1

Take good note that each query is limited to 10 results. In our case, we are only interested in the first ten results but if the Google Dance you want to monito is wider, then you'll have to repeat the query as many times as needed, incrementing by 10 the third parameter.

Is the task scheduller or a cron tab to run this program on a daily basis and leave it working for some time before going any further.

Aggregating the results

We'll use the GD interface through the perl module to generate a simple graphic of the results, drawing on line for each site in different colors.

#!/usr/local/bin/perl -w
# My GoogleDance grapher

use GD;

# GD objects inits
$graph = new GD::Image(600,220);
$white = $graph->colorAllocate(255,255,255);
$corg = $graph->colorAllocate(0,0,0);
$ccom = $graph->colorAllocate(153,204,153);
$red = $graph->colorAllocate(255,0,0);
$graph->transparent($white);
$graph->interlaced('true');

# read log file
open (LOG,"< dance.log");
while ($line = <LOG>) {
  chomp ($line);
  ($_,$dotorg,$dotcom) = split (/\s/,$line);
  push (@org,$dotorg);
  push (@com,$dotcom);
}  
close (LOG);

# generating a line for both sites
$width = 600 / (scalar @org - 1);
for ($i = 2; $i <= scalar @org; $i++) {
  $graph-<line(($i-2)*$width,220-$org[$i-2]*20,($i-1)*$width,220-$org[$i-1]*20,$corg);
}
for ($i = 2; $i <= scalar @com; $i++) {
  $graph-<line(($i-2)*$width,220-$com[$i-2]*20,($i-1)*$width,220-$com[$i-1]*20,$ccom);
}

# baseline
$graph-<line(0,201,600,201,$red);

# save the image
open (IMAGE, "< dance.png");
binmode IMAGE;
print IMAGE $graph-<png;
close (IMAGE);

exit(0);

A few month's worth of Google dance for stéphane roux

main menu