[Corpora-List] Free n-gram Software Released

From: William H. Fletcher (fletcher@usna.edu)
Date: Mon Sep 30 2002 - 23:39:32 MET DST

  • Next message: P bI K O B_ B.B.: "[Corpora-List] CL - bright examples"

    The recent flurry of discussion on n-gram software inspired me to revisit a
    project from last year. I reprogrammed kfNgram using aspects of the
    "suffix array" approach described by Mikio Yamamoto and Kenneth W. Church
    and further developed by Chunyu Kit and Yorick Wilks. The result was a
    quantum leap in performance which makes it useful even for large corpora.
    (It indexes the 25 million word CETENFolha corpus announced here last week
    in about 10 minutes on my Pentium III machine with 800 MHz processor and
    256 MB RAM, then cranks out n-gram files in under a minute.)

    kfNgram supports user-defined character sets and sort orders, and its GUI
    (graphical user interface) makes it accessible even to casual users.

    This free Windows program is available at
    http://miniappolis.com/KWiCFinder/kfNgramHelp.html
    Suggestions and comments on its usability and performance will be greatly
    appreciated.

    Bill Fletcher

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      William H. Fletcher 410.293.6362 [voice]
      Associate Professor, German & Spanish 410.293.2729 [fax]
      Language Studies Department
      US Naval Academy
      589 McNair Road
      Annapolis, MD 21402 - 5030

      fletcher@usna.edu
      http://www.usna.edu/LangStudy/
      http://kwicfinder.com/
      http://miniappolis.com/

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -



    This archive was generated by hypermail 2b29 : Mon Sep 30 2002 - 23:47:53 MET DST