Corpora: wsd software available

From: ted pedersen (
Date: Tue Feb 05 2002 - 20:06:12 MET

  • Next message: Suzan verberne: "Corpora: corpus containing spelling errors"

    We are happy to announce the availability of the complete source code
    distribution for the Duluth systems that participated in the Senseval-2
    comparative exercise among word sense disambiguation systems. This is
    free software, distributed under the GNU CopyLeft.

    This includes a number of components:

    SenseTools (v0.1), a suite of Perl programs that convert sense-tagged
    text into a feature vector representation suitable for use with the Weka
    machine learning system. Users may specify features to be identified in
    the text using regular expressions, or features may be automatically
    identified using the Bigram Statistics Package (v0.4 or better), which
    is also available.

    Duluth-Shell, a set of C-shell scripts that tie together the Bigram
    Statistics Package, SenseTools, and Weka and should allow a user to easily
    replicate the Duluth systems from Senseval-2, and provide a convenient
    starting point for further experimentation with corpus-based, machine
    learning oriented methods.

    You can find SenseTools, Duluth-Shell, the Bigram Statistics Package, and
    a pointer to Weka (which was developed at the University of Waikato) at

    Please let us know if you have any questions.


    # Ted Pedersen                   #
    # Department of Computer Science             #
    # University of Minnesota, Duluth                                        #
    # Duluth, MN 55812                                        (218) 726-8770 # 

    This archive was generated by hypermail 2b29 : Tue Feb 05 2002 - 20:13:51 MET