Re: Corpora: sgml detagger

From: Danko Sipka (
Date: Tue Apr 16 2002 - 20:31:35 MET DST

    This Perl script should do the job:

    print "What is your input file name:\n";
    open IN, $infile or die "No file, no fun!";
    open OUT, ">$infile.out" or die "No file, no fun!";
    while (<IN>) {
        print OUT "$_";
    close (IN) or die "D'oh!";
    close (OUT) or die "D'oh!";


    Danko Sipka | |

      ----- Original Message -----
      From: Tine & Colleen
      Sent: Tuesday, April 16, 2002 8:13 PM
      Subject: Corpora: sgml detagger

      I am compiling a corpus for research reasons and some of the texts are sgml-tagged.
      Does anybody know an easy way to remove the tags and save the texts as 'raw' .txt files?
      Maybe a PERL script?

      Thanks in advance

      Tine Lassen

