Re: Corpora: Creating wordlists / 2-5 word clusters / **freq = 1**

From: Tony Berber Sardinha (tony4@uol.com.br)
Date: Wed Apr 04 2001 - 18:12:16 MET DST

  • Next message: Steven Krauwer: "Re: Corpora: Corp[us|ora]"

    Dear Mark

    You can get around this limitation in WSmith Tools by selecting your corpus
    files twice, and so any clusters with a frequency of 2 will actually have a
    frequency of 1.

    cheers
    tony.
    -------------------------------------
    Dr Tony Berber Sardinha
    Catholic University of Sao Paulo, Brazil
    tony4@uol.com.br
    www.tonyberber.f2s.com
    >
    > -----Mensagem Original-----
    > De: Mark Davies <mdavies@ilstu.edu>
    > Para: <CORPORA@HD.UIB.NO>
    > Enviada em: terça-feira, 3 de abril de 2001 09:12
    > Assunto: Corpora: Creating wordlists / 2-5 word clusters / **freq = 1**
    >
    >
    > Can anyone recommend a PC-based program that creates wordlists with the
    > following three characteristics:
    >
    > 1) 2 / 3 / 4 / 5 word clusters
    > 2) ** clusters that occur as little as just one time **
    > 3) wordlists of multi-million word texts (can do smaller chunks and merge
    > them together)
    >
    > For my present needs, #2 is the most important. I've been using WordSmith,
    > and it can of course create wordlists of word clusters, but purposely
    > limits the lists to only those clusters that occur two times or more. (In
    > Settings / Min/Max Frequencies / Word Frequency you can set it as low as 1,
    > but for 2+ word clusters it won't actually return any clusters with a
    > frequency less than 2). This limitation does makes sense, since the number
    > of clusters that occur only once will be extremely large -- easily in the
    > millions of distinct strings for 4-5 word clusters. Nevertheless, for a
    > project that I am doing, this is (unfortunately) exactly what I need to do.
    >
    > Thanks in advance for your help.
    >
    > Mark Davies
    >
    > =======================================
    > Mark Davies, Associate Professor, Spanish Linguistics
    > http://mdavies.for.ilstu.edu/
    >
    > "Where is the wisdom we have lost in knowledge?
    > Where is the knowledge we have lost in information?"
    > -- T.S. Eliot
    >
    > 4300 Foreign Languages
    > Illinois State University
    > Normal, IL 61790-4300
    > Voice:309/438-7975 / Fax:309/438-8038
    > =======================================
    >



    This archive was generated by hypermail 2b29 : Thu Apr 05 2001 - 01:24:10 MET DST