Corpora: New Collection from the Linguistic Data Consortium

LDC Office (ldc@unagi.cis.upenn.edu)
Wed, 27 Aug 1997 18:40:25 EDT

Announcing a NEW RELEASE from the
LINGUISTIC DATA CONSORTIUM

CALLFRIEND Collection in 12 Languages
and 3 Dialect Comparisons

The CALLFRIEND project supports the development of language
identification technology. Calls were collected in the following
languages: American English, Canadian French, Egyptian Arabic, Farsi,
German, Hindi, Japanese, Korean, Mandarin, Spanish, Tamil, and
Vietnamese. Two major dialect groups were collected for English,
Mandarin, and Spanish. The dialect comparison groups include: southern
vs. non-southern American English, Caribbean Spanish vs. non-Caribbean
Spanish, and Mainland Mandarin (China) vs. Mandarin as spoken in
Taiwan.

Each CALLFRIEND language consists of 60 unscripted telephone
conversations, lasting between 5 and 30 minutes. The corpora also
include documentation describing speaker information (sex, age,
education, callee telephone number) and call information (channel
quality, number of speakers).

For each conversation, both the caller and callee are native speakers
of the designated language. All calls are domestic and were placed
inside the continental United States, Canada, Puerto Rico, or the
Dominican Republic.

Institutions that have membership in the LDC for either the 1996 or
1997 Membership Year will be able to receive the CALLFRIEND materials
at no additional charge, in the same manner as all other speech
corpora published by the LDC.

Nonmembers can purchase CALLFRIEND materials for research purposes
only. The cost of the CALLFRIEND collection is $600 per language or
per dialect. If you would like to order any of these corpora, please
email your request to ldc@unagi.cis.upenn.edu. If you need additional
information before placing your order, or would like to inquire about
membership in the LDC, please send email or call (215) 898-0464.

Further information about the LDC and its available corpora can be
accessed on the Linguistic Data Consortium WWW Home Page at URL
http://www.ldc.upenn.edu/. Information is also available via ftp
at ftp.cis.upenn.edu under pub/ldc; for ftp access, please use
"anonymous" as your login name, and give your email address when asked
for password.

LDC96S46 CALLFRIEND American English-Non-Southern Dialect
LDC96S47 CALLFRIEND American English-Southern Dialect
LDC96S48 CALLFRIEND Canadian French
LDC96S49 CALLFRIEND Egyptian Arabic
LDC96S50 CALLFRIEND Farsi
LDC96S51 CALLFRIEND German
LDC96S52 CALLFRIEND Hindi
LDC96S53 CALLFRIEND Japanese
LDC96S54 CALLFRIEND Korean
LDC96S55 CALLFRIEND Mandarin Chinese-Mainland Dialect
LDC96S56 CALLFRIEND Mandarin Chinese-Taiwan Dialect
LDC96S57 CALLFRIEND Spanish-Caribbean Dialect
LDC96S58 CALLFRIEND Spanish-Non-Caribbean Dialect
LDC96S59 CALLFRIEND Tamil
LDC96S60 CALLFRIEND Vietnamese