What makes a Markov model hidden?

Chris Manning (manning@lcl.cmu.edu)
Mon, 24 Apr 1995 20:44:00 -0400

On 24 April 1995, Helmut Feldweg wondered about what exactly makes a Markov
model hidden.

A Markov model is hidden when you cannot determine the state sequence it
passed through on the basis of the outputs you observed. Classifying
taggers as HMMs or just markov models is slightly complicated because
taggers like Church's are mixed.

For training on a tagged corpus, one can regard the outputs of such a
tagger as a pair consisting of a markov model state (the tag) and a word.
Thus it isn't an HMM because you can tell exactly which state the Markov
model is in at what time. However, when you do tagging with the Viterbi
algorithm, you are giving the tagger only the word and asking it to tell
you what states the machine passed through, and so you are using the tagger
as an HMM.

Taggers like the Xerox tagger are true HMM taggers in the sense that the
training is also done via an HMM -- the tagger sees only the output words,
and has to guess which part of speech sequence the HMM is moving through.

To the extent that training is the most important part, I think the former
class should be regarded as markov model taggers and the second as HMM
taggers, but in reality the first kind is mixed.

> Was the Xerox tagger by Cutting, Kupiec and Sibun the first one to use
> a *hidden* Markov model?
> If so, what makes it more *hidden* than Church's and DeRose's taggers?

The Xerox tagger is more hidden than Church's and DeRose's taggers in the
sense mentioned above. I'm not an expert on the historiography, but it
wasn't the first HMM tagger -- just the first publicly available one. The
Cutting, Kupiec, Pedersen and Sibun tagger was a reimplementation of an
earlier tagger by Kupiec at Xerox. But separately an HMM tagger was
described by G. F. Foster (1991) -- Masters Thesis, McGill, and Merialdo's
tagger (1990) was a true HMM tagger, although the situation was even more
mixed in this case in that initial parameter estimation was done in markov
model mode on tagged text, with reestimation then being done in HMM mode on
untagged text.

Chris