Corpora: Creating wordlists / 2-5 word clusters / **freq = 1**

From: Mark Davies (mdavies@ilstu.edu)
Date: Tue Apr 03 2001 - 14:12:54 MET DST

  • Next message: Pete: "RE: Corpora: Chomsky/Harris - one more fun question."

    Can anyone recommend a PC-based program that creates wordlists with the
    following three characteristics:

    1) 2 / 3 / 4 / 5 word clusters
    2) ** clusters that occur as little as just one time **
    3) wordlists of multi-million word texts (can do smaller chunks and merge
    them together)

    For my present needs, #2 is the most important. I've been using WordSmith,
    and it can of course create wordlists of word clusters, but purposely
    limits the lists to only those clusters that occur two times or more. (In
    Settings / Min/Max Frequencies / Word Frequency you can set it as low as 1,
    but for 2+ word clusters it won't actually return any clusters with a
    frequency less than 2). This limitation does makes sense, since the number
    of clusters that occur only once will be extremely large -- easily in the
    millions of distinct strings for 4-5 word clusters. Nevertheless, for a
    project that I am doing, this is (unfortunately) exactly what I need to do.

    Thanks in advance for your help.

    Mark Davies

    =======================================
    Mark Davies, Associate Professor, Spanish Linguistics
    http://mdavies.for.ilstu.edu/

    "Where is the wisdom we have lost in knowledge?
    Where is the knowledge we have lost in information?"
    -- T.S. Eliot

    4300 Foreign Languages
    Illinois State University
    Normal, IL 61790-4300
    Voice:309/438-7975 / Fax:309/438-8038
    =======================================



    This archive was generated by hypermail 2b29 : Tue Apr 03 2001 - 14:11:48 MET DST