Sometimes, you create a site for someone and despite your efforts, this site deserves no visitors or just a few lonely random wandering surfers. Even spamdexing won't even help. There are clients who judge the value of our creation to the quantity of visitors, the quality supposedly linked to traffic. In such desperate cases, perl might help you fake it.
Shibuya is a place in Tokyo, Japan, where there is a lot of people but you sometimes get the feeling that these are always the same people coming back again.
To generate traffic on a site is easy with perl and the dedicated LWP module that mimics the behaviour of a web browser if used wisely. If you think twice, you'll discover two flaws in this approach.
First, to mimic a real browser, you shouldn't just request the home page: you must access other pages linked from the home page and also fetch images so that the session looks like a human one.
Second, you need to multiply your identity, otherwise the server will notice a single visitor fetching many pages. If all the hits are generated by the same LWP agent, it's like browsing for hours the same site from the same machine, that is rubbish, even to an half decent log analyzer. To fake the multiplicity of visitors, we will go through proxies, each giving the impression of another visitor. To get a list if usable proxies go to this site by example. We store all the addresses in a text file named "proxies.txt", with site:port on each line.
To improve our furtivity, we will switch alternative user-agents name between the most popular web browsers. Check out this site for examples.
Even if you can trust a list of proxies or another, we'll start with a test phase to eliminate the slowest and broken ones. This phase is optional but highly recommended. To do so, we will use the Benchmark module that comes with every perl distribution:
open (FILEIN, "< proxies.txt") or die ("Pb with file proxies.txt $!");
while ($proxy = <FILEIN>) {
chomp($proxy);
$beginn = new Benchmark;
my $agent;
$agent = new LWP::UserAgent;
$agent -> timeout (10);
$uag = 'Mozilla/4.0 (compatible; MSIE 5.0; Windows 98; DigExt)';
$agent->agent($uag);
$agent->proxy(http => "http://$proxy");
$request = new HTTP::Request('GET','http://www.google.com');
$dapage = $agent->request($request);
$ending = new Benchmark;
$ittook = timediff($beginn,$ending);
$usr = timestr($ittook);
($rawtime) = ($usr =~ /\((\S*) /);
$ttaken{$proxy} = $rawtime;
}
close (FILEIN);
Now that we have stored in a hash the proxies performances, we will output the results in a new file "proxok.txt", sorting them so that later we start with the fastest, and optionally stop the process when the slowest are reached.
@proxok = reverse sort {$ttaken{$a} <=> $ttaken{$b}} (keys %ttaken);
open (FILEOUT, "> proxok.txt") or die ("Pb file proxok.txt $!");
foreach (@proxok) {
print FILEOUT "$_\n";
}
close (FILEOUT);
First we load the list of the user-agents from the file "uas.txt" in an array @uas. Then we loop through the proxies in the "proxok.txt" file (or "proxies.txt" if you've skipped the refining phase)
$aim = 'http://www.roux.to/';
open (UAS, "< uas.txt") or die ("Pb file uas.txt $!");
while (<UAS>) { push (@uas, $_); }
close (UAS);
open (FILEIN, "< proxok.txt") or die ("Pb file proxok.txt $!");
while ($proxy = <FILEIN>) {
chomp($proxy);
Now, with the LWP module, we generate the http agent with the selected proxy and a random user-agent and we fetch the contents of the desired page on the target site and store it in the $content variable. To lay emphasis on the schizophrenia of our agent, we assign it a fake email address as described in RFC822, using the from property (the fake email generator sub is at the end of the program).
my $agent;
$agent = new LWP::UserAgent;
$agent -> timeout (10);
$uag = $uas[int rand(@uas)];
$agent->agent($uag);
$fromail = fakemail;
$agent->from($fromail);
$agent->proxy(http => "http://$proxy");
$request = new HTTP::Request('GET',$aim);
$dapage = $agent->request($request);
$content = $dapage->content;
Time has come to analyse the html page fetched from the site in order to generate other hits on the site. We will list the links in the page and "visit" them: listing hrefs, excluding the mailto and the links to external sites, and scan the file types:
@pothrefs = ();
@pothrefs = ($content =~ /href ?= ?"([\w\.\/]+)"/gi);
@hrefs=();
foreach (@pothrefs) {
next if /http/;
next unless (/html/ or /cgi/ or /htm/);
s/^\///g;
push (@hrefs,$_);
}
foreach (@hrefs) {
$request = new HTTP::Request('GET',"$aim$_");
$dapage = $agent->request($request);
}
To gain even more furtivity, we will also fetch the images referenced inside the main page, using a similar method, src detection and filtering:
@potimages = ();
@potimages = ($content =~ /src ?= ?"([\w\.\/]+)"/gi);
@images = ();
foreach (@potimages) {
next if /http/;
next unless (/gif/ or /png/ or /jpg/ or /jpeg/);
s/^\///g;
push (@images,$_);
}
foreach (@images) {
$request = new HTTP::Request('GET',"$aim$_");
$dapage = $agent->request($request);
}
Finally, here is the sub that generates fake email addresses for the from property of our fake agent:
sub fakemail {
my $email;
@pre=('info','admin','master','boss','slave','abuse','josh');
@doma=('free','random','first','alpha','post','pre','future');
@domb=('service','info','porn','mail','internet','music','stuff');
@ext=('com','net','org','fr','co.uk');
$email = $pre[int rand(@pre)]+'@'+$doma[int rand(@doma)];
$email .= "-" if (rand() > 0.5);
$email .= $domb[int rand(@domb)]+"."+$est[int rand(@ext)];
return $email;
}
Shibuya is effective even though it contains many limitations and flaws (frames, cookies, ... to name a few). I'd be glad to receive your opinions and ideas of improvments, please contact me habettt@habett.org.