Re: Corpora: Summary of POS tagger evaluation

Oliver Mason (oliver@clg.bham.ac.uk)
Tue, 9 Feb 1999 17:56:04 +0000

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: eric@scs.leeds.ac.uk: "Corpora: JOBS at LEEDS Uni: research & teaching"
Previous message: Thorsten Brants: "Re: Corpora: Summary of POS tagger evaluation"
Maybe in reply to: Yen Ketty: "Corpora: Summary of POS tagger evaluation"
Next in thread: Ted E. Dunning: "Re: Corpora: Summary of POS tagger evaluation"

Thorsten Brants <thorsten@CoLi.Uni-SB.DE>:

one reason for _not_ excluding unambiguous words is sparse data: how do
you know that a word is unambiguous? Just that is has only one tag in
the lexicon is not sufficient because the correct tag may not be listed.

But that is not a problem of measuring the performance of the tagger.
If the tagger thinks a word is not ambiguous but you do, then that's a
problem with the lexicon. I agree that this can have a rather bad
effect on the correctness, but then you shouldn't count it as an error
during the evaluation. After all, the tagger does not know that there
are other tags. Unless, of course, you always assign all possible tags
to all words, which might be an interesting experiment...

If you exclude unambiguous words from scoring, you really would need two
different accuracy results in order to describe the performance of a
tagger: one for ambiguous words, the other one for ``unambiguous''
words.

But how would you measure the second? Give the tagger a point for each time
it assigned `DET' to `the'? (Please note that I do count unambiguous tokens
(yes, and even punctuation) for evaluation purposes; but the whole scoring
only makes sense with the additional complexity metric).

For those of you who can't get hold of the LREC paper: we mention three
different metrics, simple/non-punctuation/ambiguity, where

SM = average number of tags of all tokens
NPM = the same, excluding punctuation tokens
AM = the same for ambiguous tokens only

The results for two test texts (Orwell's ``Nineteen Eighty Four'' and Plato's
``Republic'', both in Romanian) are:

text SM NPM AM

Orwell 1.55 1.60 2.49
Plato 1.63 1.72 2.37

So for ambiguous tokens only the average number of tags was higher in the
Orwell text, while the overall average number was higher for Plato for the
other two counts.

The overall complexity of a text to be tagged is the product of any one of
the above scores and a score similar to entropy computed for the tagset that
is being used.

Oliver

-- 
//\\ computer officer | corpus research | department of english | school of  -
//\\ humanities | university of birmingham | edgbaston | birmingham b15 2tt  -
\\// united kingdom | phone +44-(0)121-414-6206 | fax +44-(0)121-414-5668/\  -
\\// mobile 07050 104504 | http://www-clg.bham.ac.uk | o.mason@bham.ac.uk\/  -

Next message: eric@scs.leeds.ac.uk: "Corpora: JOBS at LEEDS Uni: research & teaching"
Previous message: Thorsten Brants: "Re: Corpora: Summary of POS tagger evaluation"
Maybe in reply to: Yen Ketty: "Corpora: Summary of POS tagger evaluation"
Next in thread: Ted E. Dunning: "Re: Corpora: Summary of POS tagger evaluation"