sorry for the interruption.
Best regards,
Gabriel Pereira Lopes
Thorsten Brants wrote:
> O.Mason@bham.ac.uk wrote:
> > This raises an issue which is slightly more complex: if you exclude
> > punctuation (presumably on the grounds that a comma is always tagged
> > as `comma' and there is no ambiguity), why include other unambiguous
> > tokens in the scoring? If `the' always gets assigned `DET', and no
> > other tags for it are possible, then why count it and not the comma?
>
> one reason for _not_ excluding unambiguous words is sparse data: how do
> you know that a word is unambiguous? Just that is has only one tag in
> the lexicon is not sufficient because the correct tag may not be listed.
>
> If you exclude unambiguous words from scoring, you really would need two
> different accuracy results in order to describe the performance of a
> tagger: one for ambiguous words, the other one for ``unambiguous''
> words.
>
> -Thorsten