Corpora: A New Release From the LDC

LDC Office (ldc@unagi.cis.upenn.edu)
Mon, 29 Jun 1998 15:46:31 EDT

Announcing a NEW RELEASE from the
Linguistic Data Consortium

************************************************
TAIWANESE PUTONGHUA SPEECH AND TRANSCRIPT CORPUS
************************************************

This set of data on Taiwanese accented Putonghua
(PTH) was recorded in Taiwan from December 1994 to
January 1995. Taiwanese accented PTH refers to PTH
spoken by people who were born in Taiwan and whose
first language is Taiwanese (Southern Min). A total
of 40 speakers; ranging in age, education, birth
place, and family dialect; were recorded. There were
5 two-speaker dialogues and 30 single-speaker
monologues. The dialogues were about 20 minutes each
and the monologues were about 10 minutes each.
Dialogues were recorded on two tracks, one for each
speaker. Monologues were recorded on one track.

The recordings were done in ordinary, but quiet
rooms. The speakers were asked in advance to speak in
conversation style, without notes, on any topic they
chose, or no topic at all. Most speakers spoke
spontaneously and the topic drifted freely. Some
speakers talked about their professional work in a
rather formal way. One speaker (#20, a public health
official) used notes. We consider this variation in
speech style a merit of the data.

The recording tools consisted of a portable DAT
(Teac) which recorded at a 44.1 kHz sampling rate at
16 bits linear quantization. The microphones were
AudioTechnica lapel microphones with a preamp and XLR
connection to the DAT. The XLR helped low noise
recordings, and the AudioTechnica provided
widebandwidth, flat response over the speech range of
interest, was unidirectional to minimize cross-talk,
and very light in comparison with standard
microphones. Both single-speaker monologues and
two-speaker dialogues were recorded using this system
on standard DAT tape.

Before recording, all speakers read and signed the
'Informed Consent Form', which was written in Chinese
and which largely followed the standard format
approved by the Human Subject Committee of the
University of Michigan. The form stated that the
participation in the recording was entirely voluntary
and that the speech may be used for linguistic
teaching and research purposes.

The speech data are accompanied by transcripts. The
monologues have start and end time stamps. The 5
dialogues are time stamped by speaker turn.

Institutions that have membership in the LDC during
the 1998 Membership Year will be able to receive this
corpus in the same manner as all other text and
speech corpora published by the LDC.

Nonmembers can receive a copy of the Taiwanese
Putonghua Speech and Transcript Corpus for $750.

If you would like to order a copy of this corpus,
please email your request to
<ldc@unagi.cis.upenn.edu>. If you need additional
information before placing your order, or would like
to inquire about membership in the LDC, please send
email or call (215) 898-0464.

Further information about the LDC and its available
corpora can be accessed on the Linguistic Data
Consortium WWW Home Page at URL:

http://www.ldc.upenn.edu/

Information is also available via ftp at
ftp.cis.upenn.edu under pub/ldc; for ftp access,
please use "anonymous" as your login name, and give
your email address when asked for password.