lexical probability

FREEMAN ROBERT JOHN (lcrobf@uxmail.ust.hk)
Fri, 19 Apr 1996 19:11:05 +0800

> Can anyone explain to me why in the standard tagger model, the lexical
> probability is defined as the probability of the word given the tag rather
> than the tag given the word. The latter would seem much more intuitive
> (as well as easier to estimate), but is reported to give worse results
> (e.g. the discussion in Charniak's book p.50). Is there a good reason for this?
>
> Thanks,
>
> Pete
>
Intuitively speaking... it seems to me this must be because the prob. of
the tag given the word must be heavily context dependent (thats how
statistical taggers work, right?), so it wouldn't give an
interesting *lexical* definition.

I guess the probability of the word given the tag *is* traditionally
considered to be contextually independent. Clearly thats not completely
true in general either. Personally, I would be interested to know just
how untue it is. I noticed some figures recently which suggested that
there are *fewer* syntactic restrictions on tag sequences than syntactic
restrictions on word sequences! If that is the case then might it
actually be more sensible to think of language as permissible sequences
of words that can have many tags rather than as permissible sequences
of tags that can have many words.

Rob Freeman
Hong Kong University of Science and Technology
lcrobf@usthk.ust.hk