Re: requests for corpora

John Milton (lcjohn@uxmail.ust.hk)
Tue, 25 Feb 1997 13:44:10 +0800 (HKT)

> ...To find the words frequencies in a language, any language,just get a
> corpus of texts and count them! Two. Word frequencies, except for the
> most frequent words (grammatical particles), will vary from sample to
> sample. Until one has decided what constitutes a representative sample
> for one's purposes, it is futile to speak of word frequencies. None of
> the requests I have seen, here, or relayed from LINGUIST, or on
> sci.lang, shows an awareness of this.

Agreed, but if you buy into the orthodoxy that there are representative
corpora that somehow are a snapshot of the language as a whole (e.g., the
BNC or Cobuild), then it must follow that the wordlists from at least
these corpora are authoritative in some way. Since most of the world
doesn't have access to these or other large and 'principally' collected
archives, then is it so unusual that people are looking for some type of
representative wordlist? I've had to rely on the generosity of members of
this conference (thanks!) for such wordlists.
__________________________________________________
John Milton
lcjohn@uxmail.ust.hk
The Hong Kong University of Science and Technology