Re: Corpora: sgml detagger

From: William H. Fletcher (fletcher@usna.edu)
Date: Wed Apr 17 2002 - 14:56:47 MET DST

  • Next message: Steven Krauwer: "Corpora: ELSNET Directory of Language and Speech Experts"

    I have posted an SGML / HTML tag stripper for Windows at
    http://kwicfinder.com/StripTags.zip . It removes everything between pairs of
    < > , so it can fail in those rare cases in which a > is embedded within a
    comment or an attribute. It also does not translate HTML entities (e.g.
    &eacute; --> é); I'll be glad to add that feature and / or support for
    command line operation with wildcards if someone requests. Tine reports
    this program "seems to do the trick".

    Regards,
    Bill Fletcher

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      William H. Fletcher 410.293.6362 [voice]
      Associate Professor, German & Spanish 410.293.2729 [fax]
      Language Studies Department
      US Naval Academy
      589 McNair Road
      Annapolis, MD 21402 - 5030

      fletcher@usna.edu
      http://www.usna.edu/LangStudy/
      http://kwicfinder.com/
      http://miniappolis.com/

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -



    This archive was generated by hypermail 2b29 : Wed Apr 17 2002 - 14:55:34 MET DST