Re: requests for corpora

Gunnel Kallgren (gunnel@ling.su.se)
Tue, 25 Feb 1997 13:24:53 +0100

Jacques Guy wrote:

"To find the words frequencies in a language, any language,
just get a corpus of texts and count them!"

This is another example of Anglosaxon ethnocentrism. This statement holds
for English, not for languages with complex inflectional morphology. To
lemmatize all the words in a large corpus is a major undertaking that takes
time and resources and can never be satisfactorily done in a wholly
automatical way. To ask for lemmatized frequency lists for such languages is
neither silly nor lazy.

Gunnel Kallgren