[Corpora-List] UCS toolkit (v0.3)

From: Stefan Evert (evert@IMS.Uni-Stuttgart.DE)
Date: Tue Apr 06 2004 - 12:53:08 MET DST

  • Next message: Georgiana Puscasu: "[Corpora-List] Looking for email corpora"

    Dear Colleagues,

    I am happy to announce the availability of two new resources for
    research on the statistical analysis of word cooccurrences.

    1) An On-Line Repository of Association Measures

    Statistical association measures, applied to cooccurrence frequency data
    collected in a contingency table, are the most widely used tool for the
    analysis of word combinations and the extraction of collocations from text
    corpora. Over the years, many different association measures have been
    suggested (mutual information, t-score, the chi-squared test, and Dunning's
    log-likelihood, to name but a few) and used in various applications.

    This on-line resource aims to be a comprehensive repository of association
    measures, including an explanation of the theoretical background of each
    measure, references, some implementation notes, and explicit equations in
    terms of observed and expected frequencies.

      http://www.collocations.de/AM/

    2) The UCS Toolkit (version 0.3)

    The UCS toolkit is a collection of libraries and scripts for the statistical
    analysis of cooccurrence data. It can be thought of as a simple and highly
    specialised database, storing data sets of word pairs and frequency
    information in a tabular format in plain (compressed) text files. The data
    sets can be viewed, printed, manipulated in various ways, annotated with
    association scores, ranked, and sorted. In addition there are some library
    functions for the graphical evaluation of association measures in a
    collocation extraction task.

    The UCS toolkit provides reference implementations for all association
    measures listed in the on-line repository above. It is open source software,
    based on the freely available Perl (www.perl.com) and R (www.r-project.org)
    languages, and should work on most modern Unix-like operating systems
    (with experimental support for Windows under the Cygwin emulation layer).

    For more information and downloads, please turn to

      http://www.collocations.de/software.html

    or go to

      http://www.collocations.de/

    and click on "Software".

    Best Wishes and a Happy Easter,
    Stefan Evert.



    This archive was generated by hypermail 2b29 : Tue Apr 06 2004 - 13:03:35 MET DST