Re: Corpora: Summary of POS tagger evaluation

Andrew Harley (aharley@cup.cam.ac.uk)
Tue, 09 Feb 1999 10:11:58 +0000

At 10:05 PM 08/02/1999 -0500, Yen Ketty wrote:
>Dear netters,
>
>The following is the summary of POS tagger evaluation.
>Thank you for those who replied. Most of the replies
>referred me to papers. I also include recommendations and
>a past summary related to my query.
>
>Ketty Gann

Seeing my summary from over a year ago re-posted, I thought I had better
update it with some of our more recent findings. We tested more taggers,
and found that the best performers were the CLAWS tagger from Lancaster
University and the ENGCG tagger from Lingsoft, although none of the tested
taggers scored in the supposed standard 95% + range (at least not to our
scoring criteria).

The ACL/COLING 1998 conference at Montreal included a paper describing the
different theoretically best algorithms and comparing their performances.
We tested one of those algorithms, the machine-based learning approach
developed at Tilburg University, and found that at least in its basic form
it didn't match the performance of CLAWS, which uses a simple bigram model
but with much better training data, idiom lists, etc.

Is there anyone out there who has combined the latest theoretical models
with large amounts of training data and hand-crafted rules (e.g. idiom
lists) where necessary to produce a truly superior practical solution?

Andrew Harley
Systems Development Manager - ELT Reference
Cambridge University Press

Direct line: (01223)325880
Fax: (01223)325984

http://www.cup.cam.ac.uk/elt/reference