Corpora: Info re using transitional frequencies as part of language recognition

Peter H. Fries (104462.1706@compuserve.com)
Sun, 9 May 1999 00:17:13 -0400

I am asking for some help. I am trying to contribute to an article which
examines the process of reading from the perspective of processing
language. In part, this article will present a reaction to the notion that
only poor readers rely on context as they identify words. In our view this
is not realistic. Indeed, I wish to say that context is a relevant part of
perception of every aspect of language. I wish to do this by looking at the
perception of phonemes, words, etc.
Now years ago (in the early 1960s), I read a bunch of literature on
experimental phonology, and found a number of references which compared the
results of programming a computer to recognize speech using only
information from each individual segment, with the results of using the
same recognition program but adding information about the transitional
frequencies of the various phonemes. Needless to say, the inclusion of
transitional frequencies in the program improved the accuracy of
recognition considerably. Does anyone know of more recent publications
which deal with the same sorts of comparisons? I am particularly interested
in estimates of the increase in accuracy which results from including this
information, not just **that** these transitional frequencies are
used/useful. (Yes, I know that there are problems with comparing human
perception with that of computers engaging with the same tasks. However, it
seems to me that computational approaches isolate some of the relevant
issues which humans need to cope with.)
I am also interested in the same sort of information concerning optical
character recognition for written language. That is, how much does it
improve the accuracy of a OCR program to include information on the
transitional frequencies of the letters?
Any help would be appreciated.
Thank you very much.

Peter H. Fries
Peter.H.Fries@cmich.edu