Re: Corpora: Suitable software for producing lemmatised conc

Adam Kilgarriff (
Fri, 6 Feb 1998 15:38:55 GMT

For English, lemmatisation doesn't get you very far unless you also
have a POS-tagger (part-of-speech disambiguation program). The
question, "is 'helping' a noun or a verb?" has to be addressed before
you know whether to associate it with the verb lemma "help" or the
noun lemma "helping". And, again for English, POS-tagging is much
harder than lemmatisation. So it would only make sense to have a
lemmatised concordancer if it either had a built-in POS-tagger or had
specific methods for handling POS-tags as part of the input.

The IMS tagger (which Max just mentioned) and CorpusBench, from
Textware in Denmark, are two systems that take the second option.


Adam Kilgarriff
Senior Research Fellow tel: (44) 1273 642919
Information Technology Research Institute (44) 1273 642900
University of Brighton fax: (44) 1273 642908
Lewes Road
Brighton BN2 4GJ email: