Corpora: ELRA News - New resources 1/2

Valerie Mapelli (info-elra@calva.net)
Thu, 11 Jun 1998 17:16:52 +0200 (MET DST)

[ We apologise for the duplicate posting of this announcement ]

EUROPEAN LANGUAGE RESOURCES ASSOCIATION
ELRA News
=====================================

*** ELRA NEW RESOURCES - Part 1 ***

The ELRA catalogue has been updated with the following resources.

********************************************
* ELRA-S0050 Russian speech database (STC) *
********************************************

The STC Russian speech database was recorded in 1996-1998. The main
purpose of the database is to investigate individual speaker variability and to
validate speaker recognition algorithms. The database was recorded through a
16-bit Vibra-16 Creative Labs sound card with an 11,025 Hz sampling rate.

The database contains Russian read speech of 89 different speakers (54 male,
35 female), including 70 speakers with 15 sessions or more, 10 speakers with
10 sessions or more and 9 speakers with less than 10 sessions. The speakers
were recorded in Saint-Petersburg and are within the age of 18-62. All are
native speakers.

The corpus consists of 5 sentences. Each speaker reads carefully but fluently
each sentence 15 times on different dates over the period of 1-3 months. The
corpus contains a total of 6,889 utterances and of 2 volumes, total size 700
MB uncompressed data. The signal of each utterance is stored as a separate
file (approx. 126 KB). Total size of data for one speaker approximates 9,500
KB. Average utterance duration is about 5 sec.

A file gives information about the speakers (speaker's age and gender). The
orthography and phonetic transcription of the corpus is given in separate files
which contain the prompted sentences and their transcription in IPA. The
signal files are raw files without any header, 16 bit per sample, linear,
11,025
Hz sample frequency.

The recording conditions were as follows:
· Microphone: dynamic omnidirectional high-quality microphone, distance
to mouth 5-10 cm
· Environment: office room
· Sampling rate: 11,025 Hz
· Resolution: 16 Bit
· Sound board: Creative Labs Vibra-16

Means of delivery: CD-ROM

Price for ELRA members:
for research use: 400 ECU
for commercial use: 2000 ECU

Price for non members:
for research use: 800 ECU
for commercial use: 4000 ECU

*********************************************
* ELRA-S0051 German SpeechDat(II) FDB 1000 *
*********************************************

The German SpeechDat(II) FDB 1000 consists of 988 calls over the German
fixed network, stored on 4 CD-ROMs in the final SpeechDat(II) database
exchange format. The speech databases made within the SpeechDat(II)
project were validated by SPEX, the Netherlands, to assess their compliance
with the SpeechDat format and content specifications.

The following items were recorded:
· 1 isolated digit (read or prompted)
· 1 sequence of 10 isolated digit
· 4 connected digits
· 4-6 digit number to identify the prompt sheet
· ca. 10 digit telephone number (read)
· 14-16 digit credit card number (read, 150 different credit card numbers
were found)
· 6 digit PIN code (read)
· 1 natural number (read)
· 1 money amount (read)
· 3 spelled words (1 spontaneous name spelling, 2 read)
· 1 time of day (spontaneous)
· 1 time phrase (read)
· 1 date (spontaneous)
· 1 date (read)
· 1 relative date (read)
· 2 yes/no questions (spontaneous, not prompted)
· 3/6 common application words (read)

All application words are recorded more than 80 times. These are:
· 1 application word phrase
· 9 phonetically rich sentences (read)
· 4 phonetically rich words (read)
· 5 directory assistance names (1 spontaneous name (e.g. forename), 1
spontaneous city name, 1 read city name (from a list of 500 most frequent), 1
read company/agency name (from a list of 500 most frequent), 1 read proper
name, fore- and surname (from list of 150 SDB names).

· Price for research use (in ECU) Members Non members
German SpeechDat(II) FDB-1000 15,000 25,000
German SpeechDat(II) FDB-1000
+ German SpeechDat(M) DB1 or DB2 20,000 30,000

· Price for commercial use (in ECU) Members Non members
German SpeechDat(II) FDB-1000 18,000 25,000
German SpeechDat(II) FDB-1000
+ German SpeechDat(M) DB1 or DB2 25,000 35,000

SPECIAL OFFERS:

1) Price of German SpeechDat(II) FDB-1000 for ELRA members who
already purchased German SpeechDat(M) DB1 (ELRA-S0018) :

· Before 30.06.1998: 10,000 ECU
· Between 30.06.1998 and 31.12.1998: 11,000 ECU

2) If the purchase of SpeechDat(II) FDB-1000 occurs in the same calendar
year of DB1 or DB2, the package price will be:
· for research use: 20,000 ECU for ELRA members and 30,000 ECU for non
members;
· for commercial use: 25,000 ECU for ELRA members and 35,000 ECU for
non members.

********************************************
For more information, please contact:
ELRA/ELDA
55-57 rue Brillat Savarin
75013 PARIS
Tel: +33 1 43 13 33 33
Fax: +33 1 43 13 33 30
E-mail: info-elra@calva.net
http://www.icp.grenet.fr/ELRA/home.html
********************************************