Re: Q: POS tagging of spoken language transcriptions

Eric K. Ringger (ringger@cs.rochester.edu)
Wed, 22 Nov 1995 11:48:43 -0500

particularly relevant (details below) as well as more information
about the Trains Spoken Dialogue Corpus available from the LDC on
CD-ROM.

1. Peter Heeman and James Allen. Detecting and Correcting Speech
Repairs. In Proceedings of the 32nd Annual Meeting of the Association
for Computational Linguistics, June 1994. Also available from the
Computation and Language E-Print Archieve as cmp-lg/9406006.

Abstract: Interactive spoken dialog provides many new challenges for
spoken language systems. One of the most critical is the prevalence
of speech repairs. This paper presents an algorithm that detects and
corrects speech repairs based on finding the repair pattern. The
repair pattern is built by finding word matches and word replacements,
and identifying fragments and editing terms. Rather than using a set
of prebuilt templates, we build the pattern on the fly. In a fair
test, our method, when combined with a statistical model to filter
possible repairs, was successful at detecting and correcting 80\% of
the repairs, without using prosodic information or a parser.

2. Peter Heeman and James Allen, Tagging Speech Repairs. In ARPA
Workshop on Human Language Technology, March 1994.

Abstract: This paper describes a method of detecting speech repairs
that uses a part-of-speech tagger. The tagger is given knowledge about
category transitions for speech repairs, and so is able to mark a
transition either as a likely repair or as fluent speech. Other
contextual clues, such as editing terms, word fragments, and word
matchings, are also factored in by modifying the transition
probabilities.

--Eric

---
Eric K. Ringger              mailto:ringger@cs.rochester.edu
Dept. of Computer Science    Office: +1-716-275-0922; Lab: +1-716-275-5377
University of Rochester      Fax: +1-716-461-2018
Rochester NY 14627-0226      http://www.cs.rochester.edu/u/ringger/
||||| | |  |  |   |   |    |     |      |     |    |   |   |  |  | | |||||