Re: What makes a Markov model hidden?

Ken Church (kwc@research.att.com)
Fri, 28 Apr 95 08:02 EDT

The Xerox tagger is more hidden than Church's and DeRose's taggers in the
sense mentioned above. I'm not an expert on the historiography, but it
wasn't the first HMM tagger -- just the first publicly available one. The
Cutting, Kupiec, Pedersen and Sibun tagger was a reimplementation of an
earlier tagger by Kupiec at Xerox. But separately an HMM tagger was
described by G. F. Foster (1991) -- Masters Thesis, McGill, and Merialdo's
tagger (1990) was a true HMM tagger, although the situation was even more
mixed in this case in that initial parameter estimation was done in markov
model mode on tagged text, with reestimation then being done in HMM mode on
untagged text.

I'm not sure anyone knows the ``historiography'' for sure, but I
suspect that HMM taggers go back to at least the early 1980s. Here is
a section from my 1988 paper:

Statistical ngram models were quite popular in the 1950s, and have been
regaining popularity over the past few years. The IBM speech group is
perhaps the strongest advocate of ngram methods, especially in other
applications such as speech recognition. Robert Mercer (private
communication, 1982) has experimented with the tagging application,
using a restricted corpus (laser patents) and small vocabulary (1000
words). Another group of researchers working in Lancaster around the
same time, Leech, Garside and Atwell, also found ngram models highly
effective; they report 96.7% success in automatically tagging the LOB
Corpus, using a bigram model modified with heuristics to cope with more
important trigrams. The present work developed independently from the
LOB project.