What makes a Markov model hidden?

Ted Dunning (ted@crl.nmsu.edu)
Tue, 25 Apr 1995 13:02:46 -0600

the distinction between hidden and non-hidden markov models is (as an
earlier poster mentioned) based on whether or not the output of the
model is the actual sequence of states of the markov model. for a
hidden markov model, the output is not the internal state sequence,
but is a probabilistic function of this internal sequence. some
people make the output symbols be a function of each internal state,
some make the output be a function of the transitions. clearly,
having the output be a deterministic function of the internal state
sequence is a subset of the probabilistic case. also, hidden markov
models are a superset of the direct models (since the output function
can be the identity function).

in practice, for supervised training, using the direct markov model is
nice because you can estimate transition probabilities quite directly
using either a maximum likelihood method or by smoothing the maximum
likelihood estimates by some method. unfortunately, the direct markov
model has a pretty limited memory. the hidden markov model can do
better in cases where you want somewhat more interesting behaviour,
but the training method is a bit trickier. generally the method used
is what is called either baum-welch estimation or the forward-backward
algorithm. it involves starting with an initial set of transition and
output probabilities. for each input symbol, the probability
distribution of the internal state is estimated by looking at the
input and output symbols and whatever might be know about the internal
state distributions. this distribution is then used to estimate new
transition and output probabilities. this process is repeated until a
useful model is derived. this iterative process is considerably more
expensive than the estimation process for the direct markov model and
there is always some risk of problems in convergence (regardless of
any theoretical guarantees).

another option for training such a system would be to use an
evolutionary programming framework. this would allow the transition
and output functions to be expressed in a more general or more compact
fashion. one such representation might be something like a neural
net, for instance.

unsupervised training is really only of use in smoothing the
transition and output probabilities of a hidden markov model. for the
direct model, it is difficult to imagine a situation where you can
actually extract information from untagged data (conversely, you don't
need this information in some sense).

hope this helps.