New Release from the LDC

LDC Office (ldc@unagi.cis.upenn.edu)
Wed, 25 Sep 1996 08:39:25 EDT

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Antonio Corpora: "RELATIVES"
Previous message: LDC Office: "New Release from the LDC"

Announcing a NEW RELEASE from the
LINGUISTIC DATA CONSORTIUM

The Resource Management-Word Data
Continuous Speech Database (RM1)

Isolated and Spelled Word Data

This CD-ROM contains previously-unreleased isolated-word and
spell-mode (spelled out words) speech data from the (D)ARPA Resource
Management (RM1) Corpus. This data is based on a 600-word subset of
the 991-word RM1 vocabulary and contains spoken and spelled words
pertaining to the RM1 naval resource management task. This corpus
was collected simultaneously as part of the RM1 Continuous Speech
Corpus (NIST Speech Discs 2-1-2-4) and contains speech from the same
sets of subjects used in RMI.

The speech data has been segmented into separate spelled and
spoken-word waveform files for each subject-word-utterance.
Time-aligned word- and phonetic-transcriptions have been generated
automatically using forced recognition and are included. The
time-aligned transcriptions employ the same format and phone set as
the TIMIT Acoustic-Phonetic Continuous Speech Corpus (NIST Speech
Disc 1-1).

As with the continuous speech portion of RM1, this data is subsetted
into speaker-independent and speaker-dependent partitions. These
data sets are further partioned into training, development-test, and
evaluation-test subsets.

Texas Instruments recruited the subjects and collected the speech.
The National Institute of Standards and Technology (NIST) segmented
the waveforms, generated the time-aligned transcriptions and produced
this CD-ROM.

Institutions that have membership in the LDC during the 1996
Membership Year will be able to receive RM1 Word Data at no additional
charge, in the same manner as all other text and speech corpora
published by the LDC.

Nonmembers can receive a copy of RM1 Word Data for research purposes
only for a fee of $100. If you would like to order a copy of this
corpus, please email your request to ldc@ldc.cis.upenn.edu. If you
need additional information before placing your order, or would like
to inquire about membership in the LDC, please send email or call
(215) 898-0464.

Further information about the LDC and its available corpora can be
accessed on the Linguistic Data Consortium WWW Home Page at URL
http://www.ldc.upenn.edu/. Information is also available via ftp
at ftp.cis.upenn.edu under pub/ldc; for ftp access, please use
"anonymous" as your login name, and give your email address when asked
for password.

Next message: Antonio Corpora: "RELATIVES"
Previous message: LDC Office: "New Release from the LDC"