Corpora: New Corpus

LDC Office (ldc@unagi.cis.upenn.edu)
Fri, 30 Jul 1999 17:06:54 EDT

****************************************************
Tactical Speaker Identification Speech Corpus (TSID)
****************************************************

LDC is pleased to announce the release of the
Tactical Speaker Identification Speech Corpus (TSID),
which was collected by Douglas Reynolds and Gerald C.
O'Leary of MIT Lincoln Labs. It contains recordings
of 35 speakers (4 female, 31 male), using a variety
of different radio transmitters and receivers. The
recording sessions were conducted by assembling the
speakers into 7 groups of 5, then having each speaker
perform the following tasks:

- read a list of TIMIT sentences
- read a list of digit strings
- give directions for traveling from one point to another using a map
(unscripted map task)

Each speaker performed this set of tasks on each of
three transmitters (xmtr1-3), and the utterances were
recorded simultaneously on DAT recorders attached to
each of six receivers (rcvr1-6), which were located
at some distance (well out of ear-shot) from the
transmitter. Recordings were also made at the same
time on a DAT recorder near the speaker, using a
head-mounted microphone, to provide a reference
wide-band recording of the speech (refwb).

As a result, the corpus is organized along four
dimensions: speaker, transmitter, receiver, and
speaking task; this organization can be viewed as a
four-dimensional matrix, with 35x3x7x3 cells. Due to
some occasional mishaps and malfunctions during the
collection, some cells in this matrix are either
empty or only partially full.

In addition to the tasks listed above, three pairs of
speakers also participated in a two-way map task
using xmtr3; in this case, one of the speakers in the
task gives directions to the other for tracing a
route on a map, and both speakers are recorded on a
single audio channel at each of the receivers (except
for the "refwb" recording: the two speakers were
separated by some distance, using radio communication
to perform the task, and only one of them used a
head-mounted microphone and local DAT recorder for
wide-band recording).

Institutions that have membership in the LDC during
the 1999 Membership Year will be able to receive this
corpus free of charge. Nonmembers may purchase TSID
for $2000.

If you would like to order a copy of this corpus,
please email your request to
<ldc@unagi.cis.upenn.edu>. If you need additional
information before placing your order, or would like
to inquire about membership in the LDC, please send
email or call (215) 898-0464.

Further information about the LDC and its available
corpora can be accessed on the Linguistic Data
Consortium WWW Home Page at URL:

http://www.ldc.upenn.edu/

----------------------------------------------------------------------
Linguistic Data Consortium Phone: (215) 898-0464
3615 Market Street Fax: (215) 573-2175
Suite 200 email: ldc@unagi.cis.upenn.edu
Philadelphia, PA 19104-2608 www: http://www.ldc.upenn.edu