Seeing my summary from over a year ago re-posted, I thought I had better
update it with some of our more recent findings. We tested more taggers,
and found that the best performers were the CLAWS tagger from Lancaster
University and the ENGCG tagger from Lingsoft, although none of the tested
taggers scored in the supposed standard 95% + range (at least not to our
scoring criteria).
The ACL/COLING 1998 conference at Montreal included a paper describing the
different theoretically best algorithms and comparing their performances.
We tested one of those algorithms, the machine-based learning approach
developed at Tilburg University, and found that at least in its basic form
it didn't match the performance of CLAWS, which uses a simple bigram model
but with much better training data, idiom lists, etc.
Is there anyone out there who has combined the latest theoretical models
with large amounts of training data and hand-crafted rules (e.g. idiom
lists) where necessary to produce a truly superior practical solution?
Andrew Harley
Systems Development Manager - ELT Reference
Cambridge University Press
Direct line: (01223)325880
Fax: (01223)325984
http://www.cup.cam.ac.uk/elt/reference