there is a good reason. the tagger model uses a Hidden Markov Model
where the observed output symbols are the words, and the hidden
structure is based on the tags.
the HMM gives an estimate of the probability p(W) that a sequence of
words is produced. by decomposing the probability, the tagger model
is (in the case of a tri-class model):
p(W) = p(w1w2...wn)
= product p(wi/wi-2,wi-1) // depends on the last 2 words
= product p(wi/ti).p(ti/ti-2,ti-1) // depends on tag only
thus the term that is introduced naturally is p(wi/ti) and not
p(ti/wi).
however, there is a simple relation:
p(wi/ti) = p(ti/wi) * p(wi) / p(ti)
so that when the model is built, it is straightforward to produce all
probabilities.
-- ============================================================================ Bernard Merialdo ! e-mail : merialdo@eurecom.fr Professor ! Multimedia Communications Dept ! Institut EURECOM ! tel : +33 93 00 26 29 2229 Route des Cretes ! sec : +33 93 00 26 26 B.P. 193 ! fax : +33 93 00 26 27 06904 Valbonne Cedex - FRANCE ! http://www.eurecom.fr/Htdocs_media/People/merialdo.html ============================================================================