Headers in Usenet posts

02/08/01

This is a study base on the messages/posts from the many newsgroups i read. I've exported the data in #! rnews format, and created a small perl program to analyze headers that can be found and establish statistics. The #! rnews format is a simple text file where messages are separated by a single line begining with: #! rnews followed by a numerical reference. The analyse is simple, lines after the separator are headers until the blank line that separates the content/body of the message.

My perl implementation is really basic, it searches for occurences of different headers and stores three different example values. It produces then a table in html style:

$count = 0;
while ($file = glob("news/*")) {
  open (INFILE,"< $file");
  $inhead = 0;
  while ($ligne = <INFILE>) {
    if (index($ligne,'#! rnews')== 0) {
      $count++;
      $inhead = 1;
    } elsif (($inhead == 1) and ($ligne eq "\n")) {
      $inhead = 0;
    } elsif (($inhead == 1) and (index ($ligne, ":")>2)) {
      # process
      ($typehead,$value) = split (/: /, $ligne);
      $dep{$typehead}++;
      chomp($value);
      $value =~ s/</&lt;/gi;
      $value =~ s/>/&gt;/gi;
      unless ($expl{$typehead}[1]) {
        $expl{$typehead}[1] = $value;
      } elsif (!($expl{$typehead}[2]) and ($expl{$typehead}[1] ne $value)) {
        $expl{$typehead}[2] = $value;
      } elsif (!($expl{$typehead}[3]) and ($expl{$typehead}[1] ne $value) and ($expl{$typehead}[2] ne $value)) {
        $expl{$typehead}[3] = $value;
      }
    }
  }
}
print "Process on $count messages\n\n";
print "<table border=1>";
foreach (sort {$dep{$b} <=> $dep{$a}} keys %dep) {
  print "<tr><td>$_</td><td>$dep{$_}</td><td><ul>\n";
  for $i (1..3){
    print "<li>$expl{$_}[$i]</li>\n" if ($expl{$_}[$i]);
  }
  print "</ul></td></tr>\n";
}
print "</table>";
exit(0);

Here are the results i got at the time of writing, sorted by frequency:

Process on 2176 messages

Lines2176
  • 11
  • 66
  • 107
Newsgroups2176
  • alt.acorn.adverts
  • alt.acorn.adverts,comp.sys.acorn.hardware,comp.sys.acorn.networking,comp.sys.acorn.misc
  • alt.acorn.adverts,alt.ads
Message-ID2176
  • <9jres2$nhi$1@plutonium.btinternet.com>
  • <ab774da14a.roger@nifty.demon.co.uk>
  • <9k3g5c$7c$37@usenet.otenet.gr>
Subject2176
  • RISCPC for sale
  • FS
  • 10,000,000 registered members with thousands of sexy photos! It's FREE
From2176
  • "Peter Howard" <jodrell@btinternet.com>
  • Roger W Wylde <roger@niftysoftware.co.uk>
  • "4Free Teen" <bm-oddtlo@otenet.gr>
Date2176
  • Fri, 27 Jul 2001 11:24:20 +0100
  • Sat, 28 Jul 2001 20:38:05 +0100
  • Sat, 28 Jul 2001 19:59:07 +0300
Path2176
  • wanadoo.fr!opentransit.net!news.tele.dk!212.74.64.35!colt.net!dispose.news.demon.net!demon!btnet-peer0!btnet-feed5!btnet!mendelevium.btinternet.com!not-for-mail
  • wanadoo.fr!freenix!skynet.be!dispose.news.demon.net!news.demon.co.uk!demon!nifty.demon.co.uk%roger
  • wanadoo.fr!opentransit.net!news.tele.dk!213.204.128.162!news000.worldonline.se!news-out.nuthinbutnews.com!propagator!feed2.newsfeeds.com!newsfeeds.com!news-in-austin.nuthinbutnews.com!newsfeed-zh.ip-plus.net!news.ip-plus.net!Amsterdam.Infonet!News.Amsterdam.UnisourceCS!news.otenet.gr!not-for-mail
NNTP-Posting-Host2040
  • host62-7-95-184.btinternet.com
  • nifty.demon.co.uk
  • volo530-a204.otenet.gr
X-Trace1992
  • news.demon.co.uk 996349266 nnrp-08:18456 NO-IDENT nifty.demon.co.uk:158.152.44.121
  • usenet.otenet.gr 996492268 236 62.103.224.204 (30 Jul 2001 11:24:28 GMT)
  • typhoon.kc.rr.com 996532877 24.130.237.200 (Mon, 30 Jul 2001 17:41:17 CDT)
Organization1982
  • BT Internet
  • Nifty Software
  • An OTEnet S.A. customer
References1775
  • <9k9jsu$52v$1@plutonium.btinternet.com>
  • <9js4fb$jr1k$1@rn.area.com>
  • <vjvqkto2vajpj26kql5vvn1kl4g0g1qm9g@4ax.com>
X-Complaints-To1567
  • abuse@demon.net
  • abuse@otenet.gr
  • abuse@rr.com
User-Agent1036
  • Messenger-Pro/2.50 (MsgServe/1.50) (RISC-OS/4.02)
  • slrn/0.9.6.3 (Linux)
  • MacSOUP/D-2.4.6 (unregistered)
NNTP-Posting-Date989
  • Mon, 30 Jul 2001 11:24:28 +0000 (UTC)
  • Mon, 30 Jul 2001 17:41:17 CDT
  • 1 Aug 2001 07:08:41 GMT
X-Newsreader711
  • Microsoft Outlook Express 6.00.2462.0000
  • Microsoft Outlook Express 5.50.4133.2400
  • Microsoft Outlook Express 6.00.2479.0006
Content-Type632
  • text/plain;
  • text/plain; charset=us-ascii
  • text/plain; charset=ISO-8859-1
Content-Transfer-Encoding503
  • 7bit
  • 8bit
  • 7Bit
Xref428
  • Jesus.wanadoo.fr alt.acorn.adverts:155 comp.sys.acorn.hardware:7425 comp.sys.acorn.networking:1776 comp.sys.acorn.misc:10613
  • Jesus.wanadoo.fr comp.sys.acorn.announce:513 comp.sys.acorn.apps:7456 comp.sys.acorn.games:960 comp.sys.acorn.hardware:7422 comp.sys.acorn.misc:10612 comp.sys.acorn.networking:1775
  • Jesus.wanadoo.fr comp.sys.acorn.apps:7559 comp.sys.acorn.misc:10733
Reply-To371
  • "Jon Kreski" <webmaster@clickfordollars.com>
  • "Web Wizards - PHP/PERL Developers" <mh2@isis.co.za>
  • gapope@vcn.bc.ca
MIME-Version313
  • 1.0
X-MimeOLE307
  • Produced By Microsoft MimeOLE V6.00.2462.0000
  • Produced By Microsoft MimeOLE V5.50.4133.2400
  • Produced By Microsoft MimeOLE V6.00.2479.0006
X-Priority300
  • 3
  • 911
  • 3
X-MSMail-Priority298
  • Normal
  • Normal
Mime-Version284
  • 1.0
  • 1.0 (WebTV)
X-NNTP-Posting-Host282
  • nifty.demon.co.uk:158.152.44.121
  • zaynar.demon.co.uk:158.152.90.16
  • xemu.demon.co.uk:158.152.196.209
X-Accept-Language192
  • en
  • po fene,ru
  • en,en-US,en-GB,ja,af
X-Mailer185
  • Hustler Mailer
  • Mozilla 4.78 [en] (Windows NT 5.0; U)
  • Mozilla 4.75 [en] (Win98; U)
Approved162
  • Self-Moderation <authoring-cgi@boutell.com>
  • nick
  • a.m.conroy@argonet.co.uk (Andrew Conroy)
X-Orig-Path159
  • btinternet.com!domino_
  • FreeNet.co.uk%davehigton
  • steve-c.co.uk!stephen
X-Original-NNTP-Posting-Host156
  • 203.29.154.54
  • 62.60.47.177
  • 62.60.43.202
X-Posting-Agent150
  • Playboy Poster
  • Hamster/1.3.22.0
  • Hamster/1.3.21.0
X-Editor102
  • Zap, using ZapEmail 0.22 (27 Nov 1998) patch-3
  • Zap 1.44 (19 May 2000) [TEST 3], ZapEmail 0.25 (18 Mar 2000) test-2
  • Zap 1.44 (06 Jul 2001) [TEST 6], ZapEmail 0.26 (14 Jun 2001) test-2
Sender85
  • Ian
  • resurrector@mindspring.com (Guido the Resurrector)
  • Greg Mildenhall <gregm@pc-121.cs.uwa.edu.au>
X-Sender72
  • 510035616940-0001@t-dialin.net
  • 340067917323-0001@t-dialin.net
  • 520042741018-0001@t-dialin.net
Distribution70
  • world
X-Admin41
  • news@aol.com
  • news@cs.com
Followup-To37
  • comp.sys.acorn.hardware
  • comp.sys.acorn.misc
  • comp.sys.acorn.apps
Cache-Post-Path32
  • queeg.ludd.luth.se!unknown@ny.sm.luth.se
  • newsreader-hpw1.net.bms.com!unknown@a048641-dyp1164642dys.war.zim.bms.com
  • ananke.eclipse.net.uk!unknown@212.104.153.172
X-Abuse-Info32
  • Please be sure to forward a copy of ALL headers
  • Otherwise we will be unable to process your complaint properly.
  • Otherwise we will be unable to process your complaint properly
X-No-Archive31
  • yes
  • Yes
X-Cache31
  • nntpcache 2.4.0b5 (see http://www.nntpcache.org/)
  • nntpcache 2.3.3 (see http://www.nntpcache.org/)
  • nntpcache 2.4.0b4 (see http://www.nntpcache.org/)
X-Processed29
  • Monty (v.1.24)
X-Report22
  • Report abuse to nntpabuse@vip.uk.com
  • Report abuse to abuse@netscapeonline.co.uk
  • Please report illegal or inappropriate use to <abuse@newsfeeds.com>
X-NNTP-Poster21
  • NewsHound v1.33
  • NewsHound v1.35ß
X-Comments20
  • GtR Repost
  • than likely by someone other than the original poster. Please
  • see the end of this posting for a copy of the cancel.
X-Server-Date18
  • 29 Jul 2001 01:52:44 GMT
  • 29 Jul 2001 01:39:54 GMT
  • 29 Jul 2001 01:44:35 GMT
Mail-Copies-To16
  • never
  • nobody
In-Reply-To15
  • <m6a97.1974$e%4.50598@news3.oke.nextra.no>
  • <4aa0e2fec3stephen@steve-c.co.uk>
  • <4aa0a3f849kell@locsource.com>
X-OS13
  • RISC OS 4.02
X-Received-Date13
  • Thu, 26 Jul 2001 22:39:22 PDT (newsmaster1.prod.itd.earthlink.net)
  • Tue, 31 Jul 2001 21:05:02 PDT (newsmaster1.prod.itd.earthlink.net)
  • Tue, 31 Jul 2001 21:33:29 PDT (newsmaster1.prod.itd.earthlink.net)
X-Face12
  • 1+aL2.S<C:UD\AQBwLzd95j+NaYgl`9_jr8LEE;:a5UmNGxU=-]*hm]>&^e[b
  • Exf_JoskzTiRK!78R)Ouvl"U|NO&qXO,Dyo}f#}N<"zE`~w~8&&s8X@^N7sr-nEz`ro>R1CMuB{\5M9I[/l)`UT(k0!Ow?K\Fg+=Fo@%*'-@Ih`7rzN5*B[rT<,Ap;L.Pl~dirqVy$im!bo>I@ew*i[EE,T*0$jcxp41
  • B*S,_h,:U<<2\9CoqZ+"Z<es;.M'/Y[DUX2PziYd.G:.W!"f9RVlov#B+6:~r
X-X-Sender12
  • <kimotol@uhunix5>
  • <rlehy@chimay>
X-Computer11
  • Acorn RiscPC 700, StrongARM 267MHz, 24+2MB RAM, 4.2GB+850MB HD
  • Acorn RiscPC SA-110
X-Original-Trace10
  • 30 Jul 2001 22:00:48 +1000, 203.29.154.54
  • 30 Jul 2001 22:15:17 +1000, 203.29.154.54
  • 28 Jul 2001 18:05:54 +1000, 202.138.17.52
X-Comments310
  • IMPORTANT
X-Comments210
  • IMPORTANT
Keywords10
  • ALL YOUR BASE ARE BELONG TO US
  • nihilism, random, chaos
  • Acorn Guide Newsgroups FAQs Newcomers
Expires10
  • Sat, 4 Aug 2001 12:01:41 +1200
  • Sat, 18 Aug 2001 03:43:01 GMT
  • Sat, 18 Aug 2001 03:43:11 GMT
X-Mangled9
  • using ZapEmail's AST (ygolonhceT mapS-itnA)
X-MIMEOLE9
  • Produced By Microsoft MimeOLE V5.50.4522.1200
  • Produced By Microsoft MimeOLE V4.72.3110.3
  • Produced By Microsoft MimeOLE V5.00.2615.200
X-no-markup8
  • yes
X-Organization7
  • Alpha Programming
  • Photodesk Ltd, Portland. Telephone +44 (0) 1305 822753
  • Warm Silence Software Ltd
X-NFilter7
  • 1.2.1-b1
Cancel-Lock7
  • sha1:aB22uhxRqjf0uAkKSNppnsbIgKA=
  • sha1:IZa0KpgyjiD6rjsfEv0Qy9vOMEk=
  • sha1:gDjrOcTtHDj2W4s9VfNOb9E8pUY=
X-HTTP-Posting-Host6
  • dyn-213-36-140-110.ppp.libertysurf.fr
  • dyn-213-36-0-248.ppp.libertysurf.fr
  • dyn-213-36-23-201.ppp.libertysurf.fr
Posted-And-Mailed6
  • yes
Summary6
  • Newcomers' Guide to the Acorn Newsgroups and FAQs.
X-Question6
  • Do you really find headers very interesting?
X-Originating-User5
  • 144.92.164.196
  • 193.48.70.124
  • 213.19.7.83
X-Originating-Host5
  • 144.92.164.196
  • 193.48.70.124
  • 213.19.7.83
X-IRC-Identity5
  • ircnet:gerph
Content-transfer-encoding5
  • 7bit
  • 8bit
X-System5
  • ArcadeLink
X-Comment5
  • A hollow voice whispers diaxos
Supersedes5
  • <Xns90EBBAFB0140BRiXiDiXi@161.48.128.20>
  • <fr.chartes.comp.lang.perl-995254966.259465@ns2.freenix.org>
  • <fr.chartes.soc.internet-995254974.996785@ns2.freenix.org>
Errors-To5
  • PostMaster@arcade.demon.co.uk
X-Complaints-To25
  • abuse@foorum.com
  • abuse@foorum.fr
Mail-Reply4
  • 1
X-Forwarded4
  • by - (DeleGate/7.3.0)
  • by - (DeleGate/6.1.20)
X-Authenticated-User4
  • xtmlnews
  • web17
X-no-Archive3
  • Yes
X-DMCA-Notifications3
  • http://www.giganews.com/info/dmca.html
X-Foorum_user_id3 
X-POSTER3
  • foorum.com
X-distributed-net-Team3
  • Acorn Users Group (#4266)
NNTP-Posting-User3
  • pck
  • sbly
  • lperek
X-PGP-Sig2
  • 6.5.8 From,Newsgroups,Subject,Message-ID
X-Repost-Date2
  • 31 Jul 2001 22:30:01 GMT
  • 31 Jul 2001 22:45:26 GMT
Filter-X-Trace2
  • 996710666 newscene.com 17922 216.40.21.80
  • 996742968 newscene.com 17934 216.40.21.80
Content-Disposition2
  • Inline
X-URL2
  • http://habett.org/
  • http://www.alcyone.com/buffoon/
X-Original-Path2
  • sn-us!sn-xit-02!supernews.com!news.tele.dk!209.30.0.50!nntp.flash.net!easynews!e420r-sjo4.usenetserver.com!newsfeed.usenetserver.com!e420r-sjo3.usenetserver.com.POSTED!not-for-mail
  • sn-us!sn-xit-01!supernews.com!feeder.qis.net!dispose.news.demon.net!demon!xara.net!gxn.net!server6.netnews.ja.net!server4.netnews.ja.net!news5-gui.server.ntli.net!ntli.net!news6-win.server.ntlworld.com.POSTED!not-for-mail
X-Mimeole2
  • Produced By Microsoft MimeOLE V4.72.3110.3
  • Produced By Microsoft MimeOLE V5.00.3018.1300
X-Attribution2
  • T.A.
X-Reposted-By2
  • resurrector@mindspring.com (Guido the Resurrector)
X-WebTV-Signature2
  • 1
X-Original-Message-ID2
  • <bp497.10244$Fe4.449169@e420r-sjo3.usenetserver.com>
  • <uSt97.46436$SK6.6014656@news6-win.server.ntlworld.com>
Filter-NNTP-Posting-Host2
  • 216.40.21.80
X-Organisation2
  • Somewhat chaotic
X-PGP-Keys2
  • 0x53E9615A(DH/DSS),0x97B81D97(RSA) @ http://autechre.de/keys.asc
X-No-Productlink2
  • yes
Mime-version1
  • 1.0
Followupto1
  • Poster
X-Msmail-Priority1
  • Normal
X-PGP-fingerprint1
  • 6A61 3673 55ED C547 0C5A 4ABD 44F8 8CB2 A125 E138
X-No-Productlinks1
  • Yes
X-sekritcode1
  • aqjwue
X-BBC-Trace1
  • MTMyLjE4NS4yMTAuMTc2
X-Proxy-Client1
  • marriott@uiuc.edu from ppp1-10.cu.soltec.net
X-c't-Krypto-Kampagne1
  • http://www.heise.de/ct/pgpCA/
X-Processed-By1
  • BudgieSoft NewsFudge v1.17
Return-Receipt-To1
  • tim@southfrm.demon.co.uk
Archive1
  • no
X-Cricket1
  • Well, let's look on the bright side. It might rain for five days.
X-User-Info1
  • 203.167.236.226 203.167.236.226 steve.knutson
X-mang1
  • yes
X-Hello-Kitty1
  • meow meow.
X-Brought-To-You-By1
  • The letter Q and the number 12
X-Peer2Peer-Protocol1
  • Dummies to dummies
X-Posted-By1
  • fart 1.3
X-PGP-key-at1
  • http://users.durge.org/~ngb/key.gpg.asc