After working on the process of identifying thegeographic location of the visitors of a web site, and then analysing the logs of a few servers I'm responsible for, I began to wonder where is the internet from a geographic point of view. My server is in Japan, I am in France, americans are everywhere, Koreans are bandwidth fat, Chinese and Indians are so many but not too wired, ... you've heard it all before. I was suprised by the geographic logs of my servers and I wanted to have a reference point to compare with.
Using Sawmill log analyzer on a friend's computer, it came up with the following data about geographic location of habett.com and habett.org.
| Rank | On habett.org | On habett.com | ||
|---|---|---|---|---|
| 1 | France | 28.2 % | United States | 32.1 % |
| 2 | United States | 27.4 % | France | 26.5 % |
| 3 | Korea, Republic of | 11.6 % | Korea, Republic of | 9.1 % |
| 4 | United Kingdom | 3.1 % | Canada | 7.4 % |
| 5 | Iran, Islamic Republic of | 2.4 % | United Kingdom | 5.3 % |
| 6 | China | 2.3 % | China | 3.5 % |
| 7 | Canada | 2.2 % | Macao | 2.6 % |
| 8 | Germany | 1.7 % | Switzerland | 1.1 % |
| 9 | Tunisia | 1.7 % | India | 1.1 % |
| 10 | Switzerland | 1.5 % | Italy | 0.8 % |
| 11 | United Arab Emirates | 1.4 % | Belgium | 0.8 % |
| 12 | Italy | 1.1 % | Germany | 0.8 % |
| 13 | Japan | 0.9 % | Netherlands | 0.5 % |
| 14 | Morocco | 0.9 % | Denmark | 0.5 % |
| 15 | Belgium | 0.8 % | Romania | 0.5 % |
| 16 | Algeria | 0.7 % | Congo | 0.4 % |
| 17 | Egypt | 0.7 % | Algeria | 0.4 % |
| 18 | Malaysia | 0.7 % | Morocco | 0.4 % |
| 19 | India | 0.6 % | Portugal | 0.3 % |
| 20 | Brazil | 0.6 % | Australia | 0.3 % |
Right, we have it. Content on both sites is always available in french and in english so countries like France, Switzerland or Belgium are over represented. Korea was expected to be high but not that high. The Islamic Republic of Iran comes as a suprise, as do Macao, Congo or United Arab Emirates. I belong to the Long Tail so should I expected weird statistics, but how weird are this figures ?
Having a true world representation of the internet would be tricky. There are local legal restrictions to internet access, there are high bandwith countries, there are internet aware countries, ... My idea was to try to run a statistical analysis based on the http://ip-to-country.webhosting.info database. We already know about this fine file that counts IPs ranges and tells us where they are located. Once analyzed, we would get a representation of how are IPs splitted between countries. Here comes the small perl data cruncher :
open (FILE,"ip-to-country.csv");
$total = 0;
while (<FILE>) {
($beg,$end,$_,$_,$pays) = split (/,/,$_);
$beg =~ s/"//g;
$end =~ s/"//g;
$sum = $end-$beg;
$total += $sum;
chomp $pays;
$geo {$pays} += $sum;
}
close (FILE);
@orda = sort byval (keys %geo);
foreach $i (0..39) {
$percent = 100 * $geo{$orda[$i]} / $total;
print $orda[$i]." = ".sprintf("%1.2f",$percent)."\n";
}
sub byval {
$geo{$b} <=> $geo{$a};
}
And there goes the output :
| Rank | Country | Stats |
|---|---|---|
| 1 | United States | 69.02 % |
| 2 | Japan | 4.52 % |
| 3 | United Kingdom | 3.20 % |
| 4 | Germany | 2.66 % |
| 5 | Canada | 2.43 % |
| 6 | China | 2.27 % |
| 7 | Australia | 1.98 % |
| 8 | France | 1.73 % |
| 9 | Netherlands | 1.35 % |
| 10 | Korea, Republic of | 1.29 % |
| 11 | Italy | 0.91 % |
| 12 | Sweden | 0.69 % |
| 13 | Switzerland | 0.65 % |
| 14 | Spain | 0.59 % |
| 15 | Taiwan | 0.56 % |
| 16 | Brazil | 0.49 % |
| 17 | Norway | 0.40 % |
| 18 | Finland | 0.38 % |
| 19 | Russian Federation | 0.37 % |
| 20 | South Africa | 0.31 % |
| 21 | Mexico | 0.28 % |
| 22 | Poland | 0.28 % |
| 23 | Austria | 0.27 % |
| 24 | Belgium | 0.24 % |
| 25 | Denmark | 0.24 % |
| 26 | Hong Kong | 0.22 % |
| 27 | India | 0.19 % |
| 28 | Israel | 0.16 % |
| 29 | New Zealand | 0.16 % |
| 30 | Turkey | 0.15 % |
| 31 | Czech Republic | 0.13 % |
| 32 | Chile | 0.11 % |
| 33 | Hungary | 0.10 % |
| 34 | Ireland | 0.10 % |
| 35 | Argentina | 0.10 % |
| 36 | Singapore | 0.09 % |
| 37 | Portugal | 0.09 % |
| 38 | Greece | 0.09 % |
| 39 | Malaysia | 0.09 % |
| 40 | Thailand | 0.09 % |
We now have a global view but the USA are voer represented because they are host to so many server farms. You have to keep in mind that this statistics are only relevant to the number of IPs and not users.
We will now compare the global structure to our own representation. For each country, we'll see how the global percentage compares to our percentage to lay emphasis on the real importance of locations. This is the relevant ratio of marginal popularity.
| Rank | Country | habett.org | habett.com |
|---|---|---|---|
| 1 | United States | 0.40 | 0.47 |
| 2 | Japan | 0.20 | 0.05 |
| 3 | United Kingdom | 3.20 | 1.64 |
| 4 | Germany | 0.64 | 0.30 |
| 5 | Canada | 0.91 | 3.06 |
| 6 | China | 1.01 | 1.55 |
| 7 | Australia | 0.25 | 0.15 |
| 8 | France | 16.30 | 15.29 |
| 9 | Netherlands | 0.14 | 0.37 |
| 10 | Korea, Republic of | 8.99 | 7.05 |
| 11 | Italy | 1.21 | 0.92 |
| 12 | Sweden | 0.43 | 0.32 |
| 13 | Switzerland | 2.31 | 1.69 |
| 14 | Spain | 0.51 | 0.44 |
| 15 | Taiwan | 0.18 | 0.04 |
| 16 | Brazil | 1.22 | 0.42 |
| 17 | Norway | 0.25 | 0.57 |
| 18 | Finland | 0.03 | 0.04 |
| 19 | Russian Federation | 0.54 | 0.18 |
| 20 | South Africa | 0.21 | 0.05 |
| 21 | Mexico | 1.19 | 0.15 |
| 22 | Poland | 0.48 | 0.41 |
| 23 | Austria | 0.41 | 0.07 |
| 24 | Belgium | 3.55 | 3.44 |
| 25 | Denmark | 0.63 | 1.92 |
| 26 | Hong Kong | 2.20 | 0.49 |
| 27 | India | 3.20 | 5.76 |
| 28 | Israel | 1.01 | 0.94 |
| 29 | New Zealand | 0.33 | 0.05 |
| 30 | Turkey | 1.82 | 0.44 |
| 31 | Czech Republic | 0.64 | 2.30 |
| 32 | Chile | 0.68 | 1.36 |
| 33 | Hungary | 0.19 | 0.42 |
| 34 | Ireland | 2.75 | 2.92 |
| 35 | Argentina | 0.67 | 1.54 |
| 36 | Singapore | 2.61 | 2.39 |
| 37 | Portugal | 1.59 | 3.80 |
| 38 | Greece | 0.94 | 0.64 |
| 39 | Malaysia | 7.60 | 1.15 |
| 40 | Thailand | 4.00 | 0.77 |
| 41 | Romania | 4.79 | 8.17 |
| 46 | Egypt | 15.90 | 0.76 |
| 52 | Iran, Islamic Republic of | 81.92 | 0.39 |
| 58 | United Arab Emirates | 67.93 | 8.04 |
| 76 | Morocco | 140.12 | 54.11 |
| 85 | Macao | 0.00 | 550.43 |
| 89 | Tunisia | 430.85 | 55.68 |
| 90 | Algeria | 195.32 | 108.13 |
| 198 | Congo | 44.00 | 4147.47 |