Entêtes dans les messages Usenet

02/08/01

Etude portant sur messages issus de différents newsgroups que je lis. Exportés en format #! rnews format, j'ai écrit un petit programe en perl pour analyser les headers que l'on rencontre et en effectuer des statistiques. Ce format est en fait un fichier texte ou les différents messages sont séparés par une ligne commencant par #! rnews et suivi d'un référence numérique. L'analyse est donc assez simple, les lignes suivant celle-ci forment les headers jusqu'a ce que l'on rencontre une ligne vide qui sert de séparateur entre les headers et le corps du message.

Mon implémentation en perl effectue la recherche d'occurences des différents headers et en stocke les trois exemples différents. Le tout est produit en sortie en une table en html directement exploitable:

$count = 0;
while ($file = glob("news/*")) {
  open (INFILE,"< $file");
  $inhead = 0;
  while ($ligne = <INFILE>) {
    if (index($ligne,'#! rnews')== 0) {
      $count++;
      $inhead = 1;
    } elsif (($inhead == 1) and ($ligne eq "\n")) {
      $inhead = 0;
    } elsif (($inhead == 1) and (index ($ligne, ":")>2)) {
      # process
      ($typehead,$value) = split (/: /, $ligne);
      $dep{$typehead}++;
      chomp($value);
      $value =~ s/</&lt;/gi;
      $value =~ s/>/&gt;/gi;
      unless ($expl{$typehead}[1]) {
        $expl{$typehead}[1] = $value;
      } elsif (!($expl{$typehead}[2]) and ($expl{$typehead}[1] ne $value)) {
        $expl{$typehead}[2] = $value;
      } elsif (!($expl{$typehead}[3]) and ($expl{$typehead}[1] ne $value) and ($expl{$typehead}[2] ne $value)) {
        $expl{$typehead}[3] = $value;
      }
    }
  }
}
print "Process on $count messages\n\n";
print "<table border=1>";
foreach (sort {$dep{$b} <=> $dep{$a}} keys %dep) {
  print "<tr><td>$_</td><td>$dep{$_}</td><td><ul>\n";
  for $i (1..3){
    print "<li>$expl{$_}[$i]</li>\n" if ($expl{$_}[$i]);
  }
  print "</ul></td></tr>\n";
}
print "</table>";
exit(0);

J'obtiens les résultats suivants, classés par ordre de fréquence d'apparition:

Process on 2176 messages

Lines2176
  • 11
  • 66
  • 107
Newsgroups2176
  • alt.acorn.adverts
  • alt.acorn.adverts,comp.sys.acorn.hardware,comp.sys.acorn.networking,comp.sys.acorn.misc
  • alt.acorn.adverts,alt.ads
Message-ID2176
  • <9jres2$nhi$1@plutonium.btinternet.com>
  • <ab774da14a.roger@nifty.demon.co.uk>
  • <9k3g5c$7c$37@usenet.otenet.gr>
Subject2176
  • RISCPC for sale
  • FS
  • 10,000,000 registered members with thousands of sexy photos! It's FREE
From2176
  • "Peter Howard" <jodrell@btinternet.com>
  • Roger W Wylde <roger@niftysoftware.co.uk>
  • "4Free Teen" <bm-oddtlo@otenet.gr>
Date2176
  • Fri, 27 Jul 2001 11:24:20 +0100
  • Sat, 28 Jul 2001 20:38:05 +0100
  • Sat, 28 Jul 2001 19:59:07 +0300
Path2176
  • wanadoo.fr!opentransit.net!news.tele.dk!212.74.64.35!colt.net!dispose.news.demon.net!demon!btnet-peer0!btnet-feed5!btnet!mendelevium.btinternet.com!not-for-mail
  • wanadoo.fr!freenix!skynet.be!dispose.news.demon.net!news.demon.co.uk!demon!nifty.demon.co.uk%roger
  • wanadoo.fr!opentransit.net!news.tele.dk!213.204.128.162!news000.worldonline.se!news-out.nuthinbutnews.com!propagator!feed2.newsfeeds.com!newsfeeds.com!news-in-austin.nuthinbutnews.com!newsfeed-zh.ip-plus.net!news.ip-plus.net!Amsterdam.Infonet!News.Amsterdam.UnisourceCS!news.otenet.gr!not-for-mail
NNTP-Posting-Host2040
  • host62-7-95-184.btinternet.com
  • nifty.demon.co.uk
  • volo530-a204.otenet.gr
X-Trace1992
  • news.demon.co.uk 996349266 nnrp-08:18456 NO-IDENT nifty.demon.co.uk:158.152.44.121
  • usenet.otenet.gr 996492268 236 62.103.224.204 (30 Jul 2001 11:24:28 GMT)
  • typhoon.kc.rr.com 996532877 24.130.237.200 (Mon, 30 Jul 2001 17:41:17 CDT)
Organization1982
  • BT Internet
  • Nifty Software
  • An OTEnet S.A. customer
References1775
  • <9k9jsu$52v$1@plutonium.btinternet.com>
  • <9js4fb$jr1k$1@rn.area.com>
  • <vjvqkto2vajpj26kql5vvn1kl4g0g1qm9g@4ax.com>
X-Complaints-To1567
  • abuse@demon.net
  • abuse@otenet.gr
  • abuse@rr.com
User-Agent1036
  • Messenger-Pro/2.50 (MsgServe/1.50) (RISC-OS/4.02)
  • slrn/0.9.6.3 (Linux)
  • MacSOUP/D-2.4.6 (unregistered)
NNTP-Posting-Date989
  • Mon, 30 Jul 2001 11:24:28 +0000 (UTC)
  • Mon, 30 Jul 2001 17:41:17 CDT
  • 1 Aug 2001 07:08:41 GMT
X-Newsreader711
  • Microsoft Outlook Express 6.00.2462.0000
  • Microsoft Outlook Express 5.50.4133.2400
  • Microsoft Outlook Express 6.00.2479.0006
Content-Type632
  • text/plain;
  • text/plain; charset=us-ascii
  • text/plain; charset=ISO-8859-1
Content-Transfer-Encoding503
  • 7bit
  • 8bit
  • 7Bit
Xref428
  • Jesus.wanadoo.fr alt.acorn.adverts:155 comp.sys.acorn.hardware:7425 comp.sys.acorn.networking:1776 comp.sys.acorn.misc:10613
  • Jesus.wanadoo.fr comp.sys.acorn.announce:513 comp.sys.acorn.apps:7456 comp.sys.acorn.games:960 comp.sys.acorn.hardware:7422 comp.sys.acorn.misc:10612 comp.sys.acorn.networking:1775
  • Jesus.wanadoo.fr comp.sys.acorn.apps:7559 comp.sys.acorn.misc:10733
Reply-To371
  • "Jon Kreski" <webmaster@clickfordollars.com>
  • "Web Wizards - PHP/PERL Developers" <mh2@isis.co.za>
  • gapope@vcn.bc.ca
MIME-Version313
  • 1.0
X-MimeOLE307
  • Produced By Microsoft MimeOLE V6.00.2462.0000
  • Produced By Microsoft MimeOLE V5.50.4133.2400
  • Produced By Microsoft MimeOLE V6.00.2479.0006
X-Priority300
  • 3
  • 911
  • 3
X-MSMail-Priority298
  • Normal
  • Normal
Mime-Version284
  • 1.0
  • 1.0 (WebTV)
X-NNTP-Posting-Host282
  • nifty.demon.co.uk:158.152.44.121
  • zaynar.demon.co.uk:158.152.90.16
  • xemu.demon.co.uk:158.152.196.209
X-Accept-Language192
  • en
  • po fene,ru
  • en,en-US,en-GB,ja,af
X-Mailer185
  • Hustler Mailer
  • Mozilla 4.78 [en] (Windows NT 5.0; U)
  • Mozilla 4.75 [en] (Win98; U)
Approved162
  • Self-Moderation <authoring-cgi@boutell.com>
  • nick
  • a.m.conroy@argonet.co.uk (Andrew Conroy)
X-Orig-Path159
  • btinternet.com!domino_
  • FreeNet.co.uk%davehigton
  • steve-c.co.uk!stephen
X-Original-NNTP-Posting-Host156
  • 203.29.154.54
  • 62.60.47.177
  • 62.60.43.202
X-Posting-Agent150
  • Playboy Poster
  • Hamster/1.3.22.0
  • Hamster/1.3.21.0
X-Editor102
  • Zap, using ZapEmail 0.22 (27 Nov 1998) patch-3
  • Zap 1.44 (19 May 2000) [TEST 3], ZapEmail 0.25 (18 Mar 2000) test-2
  • Zap 1.44 (06 Jul 2001) [TEST 6], ZapEmail 0.26 (14 Jun 2001) test-2
Sender85
  • Ian
  • resurrector@mindspring.com (Guido the Resurrector)
  • Greg Mildenhall <gregm@pc-121.cs.uwa.edu.au>
X-Sender72
  • 510035616940-0001@t-dialin.net
  • 340067917323-0001@t-dialin.net
  • 520042741018-0001@t-dialin.net
Distribution70
  • world
X-Admin41
  • news@aol.com
  • news@cs.com
Followup-To37
  • comp.sys.acorn.hardware
  • comp.sys.acorn.misc
  • comp.sys.acorn.apps
Cache-Post-Path32
  • queeg.ludd.luth.se!unknown@ny.sm.luth.se
  • newsreader-hpw1.net.bms.com!unknown@a048641-dyp1164642dys.war.zim.bms.com
  • ananke.eclipse.net.uk!unknown@212.104.153.172
X-Abuse-Info32
  • Please be sure to forward a copy of ALL headers
  • Otherwise we will be unable to process your complaint properly.
  • Otherwise we will be unable to process your complaint properly
X-No-Archive31
  • yes
  • Yes
X-Cache31
  • nntpcache 2.4.0b5 (see http://www.nntpcache.org/)
  • nntpcache 2.3.3 (see http://www.nntpcache.org/)
  • nntpcache 2.4.0b4 (see http://www.nntpcache.org/)
X-Processed29
  • Monty (v.1.24)
X-Report22
  • Report abuse to nntpabuse@vip.uk.com
  • Report abuse to abuse@netscapeonline.co.uk
  • Please report illegal or inappropriate use to <abuse@newsfeeds.com>
X-NNTP-Poster21
  • NewsHound v1.33
  • NewsHound v1.35ß
X-Comments20
  • GtR Repost
  • than likely by someone other than the original poster. Please
  • see the end of this posting for a copy of the cancel.
X-Server-Date18
  • 29 Jul 2001 01:52:44 GMT
  • 29 Jul 2001 01:39:54 GMT
  • 29 Jul 2001 01:44:35 GMT
Mail-Copies-To16
  • never
  • nobody
In-Reply-To15
  • <m6a97.1974$e%4.50598@news3.oke.nextra.no>
  • <4aa0e2fec3stephen@steve-c.co.uk>
  • <4aa0a3f849kell@locsource.com>
X-OS13
  • RISC OS 4.02
X-Received-Date13
  • Thu, 26 Jul 2001 22:39:22 PDT (newsmaster1.prod.itd.earthlink.net)
  • Tue, 31 Jul 2001 21:05:02 PDT (newsmaster1.prod.itd.earthlink.net)
  • Tue, 31 Jul 2001 21:33:29 PDT (newsmaster1.prod.itd.earthlink.net)
X-Face12
  • 1+aL2.S<C:UD\AQBwLzd95j+NaYgl`9_jr8LEE;:a5UmNGxU=-]*hm]>&^e[b
  • Exf_JoskzTiRK!78R)Ouvl"U|NO&qXO,Dyo}f#}N<"zE`~w~8&&s8X@^N7sr-nEz`ro>R1CMuB{\5M9I[/l)`UT(k0!Ow?K\Fg+=Fo@%*'-@Ih`7rzN5*B[rT<,Ap;L.Pl~dirqVy$im!bo>I@ew*i[EE,T*0$jcxp41
  • B*S,_h,:U<<2\9CoqZ+"Z<es;.M'/Y[DUX2PziYd.G:.W!"f9RVlov#B+6:~r
X-X-Sender12
  • <kimotol@uhunix5>
  • <rlehy@chimay>
X-Computer11
  • Acorn RiscPC 700, StrongARM 267MHz, 24+2MB RAM, 4.2GB+850MB HD
  • Acorn RiscPC SA-110
X-Original-Trace10
  • 30 Jul 2001 22:00:48 +1000, 203.29.154.54
  • 30 Jul 2001 22:15:17 +1000, 203.29.154.54
  • 28 Jul 2001 18:05:54 +1000, 202.138.17.52
X-Comments310
  • IMPORTANT
X-Comments210
  • IMPORTANT
Keywords10
  • ALL YOUR BASE ARE BELONG TO US
  • nihilism, random, chaos
  • Acorn Guide Newsgroups FAQs Newcomers
Expires10
  • Sat, 4 Aug 2001 12:01:41 +1200
  • Sat, 18 Aug 2001 03:43:01 GMT
  • Sat, 18 Aug 2001 03:43:11 GMT
X-Mangled9
  • using ZapEmail's AST (ygolonhceT mapS-itnA)
X-MIMEOLE9
  • Produced By Microsoft MimeOLE V5.50.4522.1200
  • Produced By Microsoft MimeOLE V4.72.3110.3
  • Produced By Microsoft MimeOLE V5.00.2615.200
X-no-markup8
  • yes
X-Organization7
  • Alpha Programming
  • Photodesk Ltd, Portland. Telephone +44 (0) 1305 822753
  • Warm Silence Software Ltd
X-NFilter7
  • 1.2.1-b1
Cancel-Lock7
  • sha1:aB22uhxRqjf0uAkKSNppnsbIgKA=
  • sha1:IZa0KpgyjiD6rjsfEv0Qy9vOMEk=
  • sha1:gDjrOcTtHDj2W4s9VfNOb9E8pUY=
X-HTTP-Posting-Host6
  • dyn-213-36-140-110.ppp.libertysurf.fr
  • dyn-213-36-0-248.ppp.libertysurf.fr
  • dyn-213-36-23-201.ppp.libertysurf.fr
Posted-And-Mailed6
  • yes
Summary6
  • Newcomers' Guide to the Acorn Newsgroups and FAQs.
X-Question6
  • Do you really find headers very interesting?
X-Originating-User5
  • 144.92.164.196
  • 193.48.70.124
  • 213.19.7.83
X-Originating-Host5
  • 144.92.164.196
  • 193.48.70.124
  • 213.19.7.83
X-IRC-Identity5
  • ircnet:gerph
Content-transfer-encoding5
  • 7bit
  • 8bit
X-System5
  • ArcadeLink
X-Comment5
  • A hollow voice whispers diaxos
Supersedes5
  • <Xns90EBBAFB0140BRiXiDiXi@161.48.128.20>
  • <fr.chartes.comp.lang.perl-995254966.259465@ns2.freenix.org>
  • <fr.chartes.soc.internet-995254974.996785@ns2.freenix.org>
Errors-To5
  • PostMaster@arcade.demon.co.uk
X-Complaints-To25
  • abuse@foorum.com
  • abuse@foorum.fr
Mail-Reply4
  • 1
X-Forwarded4
  • by - (DeleGate/7.3.0)
  • by - (DeleGate/6.1.20)
X-Authenticated-User4
  • xtmlnews
  • web17
X-no-Archive3
  • Yes
X-DMCA-Notifications3
  • http://www.giganews.com/info/dmca.html
X-Foorum_user_id3 
X-POSTER3
  • foorum.com
X-distributed-net-Team3
  • Acorn Users Group (#4266)
NNTP-Posting-User3
  • pck
  • sbly
  • lperek
X-PGP-Sig2
  • 6.5.8 From,Newsgroups,Subject,Message-ID
X-Repost-Date2
  • 31 Jul 2001 22:30:01 GMT
  • 31 Jul 2001 22:45:26 GMT
Filter-X-Trace2
  • 996710666 newscene.com 17922 216.40.21.80
  • 996742968 newscene.com 17934 216.40.21.80
Content-Disposition2
  • Inline
X-URL2
  • http://habett.org/
  • http://www.alcyone.com/buffoon/
X-Original-Path2
  • sn-us!sn-xit-02!supernews.com!news.tele.dk!209.30.0.50!nntp.flash.net!easynews!e420r-sjo4.usenetserver.com!newsfeed.usenetserver.com!e420r-sjo3.usenetserver.com.POSTED!not-for-mail
  • sn-us!sn-xit-01!supernews.com!feeder.qis.net!dispose.news.demon.net!demon!xara.net!gxn.net!server6.netnews.ja.net!server4.netnews.ja.net!news5-gui.server.ntli.net!ntli.net!news6-win.server.ntlworld.com.POSTED!not-for-mail
X-Mimeole2
  • Produced By Microsoft MimeOLE V4.72.3110.3
  • Produced By Microsoft MimeOLE V5.00.3018.1300
X-Attribution2
  • T.A.
X-Reposted-By2
  • resurrector@mindspring.com (Guido the Resurrector)
X-WebTV-Signature2
  • 1
X-Original-Message-ID2
  • <bp497.10244$Fe4.449169@e420r-sjo3.usenetserver.com>
  • <uSt97.46436$SK6.6014656@news6-win.server.ntlworld.com>
Filter-NNTP-Posting-Host2
  • 216.40.21.80
X-Organisation2
  • Somewhat chaotic
X-PGP-Keys2
  • 0x53E9615A(DH/DSS),0x97B81D97(RSA) @ http://autechre.de/keys.asc
X-No-Productlink2
  • yes
Mime-version1
  • 1.0
Followupto1
  • Poster
X-Msmail-Priority1
  • Normal
X-PGP-fingerprint1
  • 6A61 3673 55ED C547 0C5A 4ABD 44F8 8CB2 A125 E138
X-No-Productlinks1
  • Yes
X-sekritcode1
  • aqjwue
X-BBC-Trace1
  • MTMyLjE4NS4yMTAuMTc2
X-Proxy-Client1
  • marriott@uiuc.edu from ppp1-10.cu.soltec.net
X-c't-Krypto-Kampagne1
  • http://www.heise.de/ct/pgpCA/
X-Processed-By1
  • BudgieSoft NewsFudge v1.17
Return-Receipt-To1
  • tim@southfrm.demon.co.uk
Archive1
  • no
X-Cricket1
  • Well, let's look on the bright side. It might rain for five days.
X-User-Info1
  • 203.167.236.226 203.167.236.226 steve.knutson
X-mang1
  • yes
X-Hello-Kitty1
  • meow meow.
X-Brought-To-You-By1
  • The letter Q and the number 12
X-Peer2Peer-Protocol1
  • Dummies to dummies
X-Posted-By1
  • fart 1.3
X-PGP-key-at1
  • http://users.durge.org/~ngb/key.gpg.asc