Re: [Corpora-List] token clustering tool

From: Tony Berber Sardinha (tony4@uol.com.br)
Date: Tue May 11 2004 - 15:32:41 MET DST

  • Next message: Hal Daume III: "Re: [Corpora-List] token clustering tool"

    Hi Murk

    (1) SImple chunker:
    -First, upload your corpus at http://lael.pucsp.br/corpora/enviar and obtain a
    password
    -Then go to http://lael.pucsp.br/corpora/ngrama/index.html, enter your password
    and cluster size, click on Fazer
    -See results
    (2) N-gram Statistics Package v.0.5 (by Ted Pedersen and Satanjeev Banerjee)
    -First, upload your corpus at http://lael.pucsp.br/corpora/enviar and obtain a
    password
    -Go to http://lael.pucsp.br/corpora/nsp/index.html, enter your password and
    other options, click on Fazer
    -See results

    If you're on Linux / Mac OSX / Unix / Cygwin I can send you a simple Unix Shell
    script for that.

    cheers
    tony.
    -------------------------------------
    Dr Tony Berber Sardinha
    LAEL, PUC/SP
    (Catholic University of Sao Paulo, Brazil)
    tony4@uol.com.br
    http://lael.pucsp.br/~tony
    [New website]

    ----- Original Message -----
    From: "Murk Wuite" <Murk@polderland.nl>
    To: <CORPORA@HD.UIB.NO>
    Sent: terça-feira, 11 de maio de 2004 04:24
    Subject: [Corpora-List] token clustering tool

    Dear all,

    Does anyone know of a tool (or algorithm), preferably available freely
    for research purposes, that takes as its input a corpus only and
    produces as its output clusters of tokens that occur close to each other
    relatively often?

    Best wishes,

    Murk Wuite
    MA student at the Department of Language and Speech, Katholieke
    Universiteit Nijmegen, The Netherlands



    This archive was generated by hypermail 2b29 : Tue May 11 2004 - 15:41:49 MET DST