Re: Corpora: a program needed

From: David Graff (graff@unagi.cis.upenn.edu)
Date: Thu May 30 2002 - 16:35:22 MET DST

  • Next message: Menno van Zaanen: "Corpora: ICGI Call for Software Demonstrations"

    Sampo,

    The command line perl script I sent you earlier (which I failed to copy
    to the list), could actually be expressed more briefly. Again, granting
    that the data is already tokenized to one word token per line:

    cat token.stream | \
     perl -pe 's/(\S+)/exists($t{$1}) ? $t{$1}:($t{$1}=++$tc)/e'

        Best regards,

            Dave Graff



    This archive was generated by hypermail 2b29 : Thu May 30 2002 - 16:42:59 MET DST