Corpora: bigram statistics package v0.5

From: ted pedersen (
Date: Tue Jun 04 2002 - 21:10:22 MET DST

  • Next message: Sebastian Hoffmann: "Corpora: Software release: BNCweb"

    BSP is now NSP!

    Version 0.5 of the Bigram Statistics Package is now available, and
    has been renamed the N-gram Statistics Package (NSP v0.5).

    NSP is an easy-to-use suite of Perl tools for counting and analyzing
    word n-grams in text. It provides a number of standard tests of
    association that can be used to identify word n-grams in large corpora,
    and also allows users to easily implement other tests without knowing
    very much about Perl at all.

    Earlier versions of this package were known as the Bigram Statistics
    Package (BSP v0.1, v0.3, v0.4) and dealt exclusively with word bigrams
    (two word sequences). NSP v0.5 is backwards compatible with these
    earlier versions, and adds supports for word n-grams.

    Also new to v0.5 is support for user defined tokenization using regular
    expressions, stop lists, and an extensive collection of test/sample scripts.

    This is free software. Download it (or view the README) at:


    # Ted Pedersen                   #
    # Department of Computer Science             #
    # University of Minnesota Duluth                                         #
    # Duluth, MN 55812                                        (218) 726-8770 #

    This archive was generated by hypermail 2b29 : Tue Jun 04 2002 - 21:24:42 MET DST