Corpora: New Resource

LDC Office (ldc@unagi.cis.upenn.edu)
Mon, 12 Jul 1999 17:00:02 EDT

The LDC is pleased to announce the beta release of
the American English Spoken Lexicon via LDC-Online.
The American English Spoken Lexicon contains
pronunciations captured in individual audio files for
53,611 of the most common words in English. These
words were extracted from the LDC CallHome American
English Lexicon (aka PRONLEX) with frequency
determined from a variety of media sources.

All words were recorded in a quiet sound-proof room
by a female graduate student who is a native speaker
of American English.

The files are in NIST Sphere format. They have the
words as their filenames, and are stored in 110
encyclopedia-style subdirectories. The maximum
recording levels of most files range from 6.0 to
7.1dB.

LDC-Online provides a CGI interface to these speech
files. This interface allows keyword searching and
alphabetical browsing. The interface displays the
words with phonetic transcriptions and frequency
counts, and also plays them in different audio
formats such as AU, WAV, AIFF, LDC wave, and JAVA
WaveView applet.

LDC plans to release a CD-ROM version of this spoken
lexicon later in Membership Year 1999, which will be
available at no cost for 1999 members. The current
online beta version is available free of charge for
research purposes to LDC members and others.

LDC-Online, as well as information about LDC and its
available resources, can be accessed on the LDC WWW
Home Page at URL:

http://www.ldc.upenn.edu/

If you need further information please call (215)
898-0464 or send email to ldc@ldc.upenn.edu.

----------------------------------------------------------------------
Linguistic Data Consortium Phone: (215) 898-0464
3615 Market Street Fax: (215) 573-2175
Suite 200 email: ldc@unagi.cis.upenn.ed
Philadelphia, PA 19104-2608 www: http://www.ldc.upenn.edu