[Corpora-List] KOLOKACJE program

From: Beata Wójtowicz (wierzchob@wp.pl)
Date: Fri Apr 09 2004 - 00:08:04 MET DST

  • Next message: Milena Slavcheva: "[Corpora-List] frequency lists"

    On behalf of Aleksander Buczyński I would like to inform about availability
    of a new program that combines web crawler and collocation finder -
    "Kolokacje".
    The program has been written by Aleksander Buczynski and is distributed for
    free under the GNU General Public License.

    The program can be used to:
    - build a corpora of texts from selected websites, with an option to filter
    out most of the HTML "noise" (duplicate pages, menus etc.);
    - monitor changes on selected websites;
    - find strong and/or frequent collocations;
    - find keywords for a collection of documents;
    - get sample contexts (concordances) for given words or collocations;
    - compare 14 different statistical tests used for collocation detection.

    The program can be accessed in a number of ways:
    - through a simple graphical interface, provided by
    kolokacje.standalone.SAMain and kolokacje.standalone.SAManager - - this is
    the easiest way to get familiar with the basic functions;
    - calling selected modules from the shell command line;
    - calling selected methods from your own Java program;
    - using kolokacje.server.PrettyPrinter and kolokacje.server.QueryServer to
    build a web based interface;
    - using kolokacje.server.PrettyPrinter to ask queries from a console and
    then viewing the results in a HTML browser.

    For more information and downloads, please see
    http://www.mimuw.edu.pl/polszczyzna/kolokacje/index-en.htm

    Kind regards,
    Beata Wojtowicz



    This archive was generated by hypermail 2b29 : Fri Apr 09 2004 - 01:37:00 MET DST