Corpora: wsd software available

From: ted pedersen (tpederse@d.umn.edu)
Date: Tue Feb 05 2002 - 20:06:12 MET

Next message: Suzan verberne: "Corpora: corpus containing spelling errors"

Previous message: Magali Duclaux: "Corpora: ELRA News"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

We are happy to announce the availability of the complete source code
distribution for the Duluth systems that participated in the Senseval-2
comparative exercise among word sense disambiguation systems. This is
free software, distributed under the GNU CopyLeft.

This includes a number of components:

SenseTools (v0.1), a suite of Perl programs that convert sense-tagged
text into a feature vector representation suitable for use with the Weka
machine learning system. Users may specify features to be identified in
the text using regular expressions, or features may be automatically
identified using the Bigram Statistics Package (v0.4 or better), which
is also available.

Duluth-Shell, a set of C-shell scripts that tie together the Bigram
Statistics Package, SenseTools, and Weka and should allow a user to easily
replicate the Duluth systems from Senseval-2, and provide a convenient
starting point for further experimentation with corpus-based, machine
learning oriented methods.

You can find SenseTools, Duluth-Shell, the Bigram Statistics Package, and
a pointer to Weka (which was developed at the University of Waikato) at
http://www.d.umn.edu/~tpederse/senseval2.html

Please let us know if you have any questions.

Enjoy!
Ted

--
# Ted Pedersen                            http://www.d.umn.edu/~tpederse #
# Department of Computer Science                      tpederse@d.umn.edu #
# University of Minnesota, Duluth                                        #
# Duluth, MN 55812                                        (218) 726-8770 #

Next message: Suzan verberne: "Corpora: corpus containing spelling errors"
Previous message: Magali Duclaux: "Corpora: ELRA News"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Feb 05 2002 - 20:13:51 MET