Re: Corpora: Suitable software for producing lemmatised conc

Adam Kilgarriff (Adam.Kilgarriff@itri.brighton.ac.uk)
Fri, 6 Feb 1998 15:38:55 GMT

For English, lemmatisation doesn't get you very far unless you also
have a POS-tagger (part-of-speech disambiguation program). The
question, "is 'helping' a noun or a verb?" has to be addressed before
you know whether to associate it with the verb lemma "help" or the
noun lemma "helping". And, again for English, POS-tagging is much
harder than lemmatisation. So it would only make sense to have a
lemmatised concordancer if it either had a built-in POS-tagger or had
specific methods for handling POS-tags as part of the input.

The IMS tagger (which Max just mentioned) and CorpusBench, from
Textware in Denmark, are two systems that take the second option.

Adam

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Adam Kilgarriff
Senior Research Fellow tel: (44) 1273 642919
Information Technology Research Institute (44) 1273 642900
University of Brighton fax: (44) 1273 642908
Lewes Road
Brighton BN2 4GJ email: Adam.Kilgarriff@itri.bton.ac.uk
UK http://www.itri.bton.ac.uk/~Adam.Kilgarriff
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%