ELRA News, Message 1/3

Valirie Mapelli (info-elra@calva.net)
Fri, 7 Feb 1997 17:31:58 +0100 (MET)

INFORMATION FROM THE=20
European Language Resources Association
ELRA News=20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

[ We apologise for the duplicate posting of this announcement ]

This message is the first out of three. The following two will elaborate on
ELRA Written and Terminological Resources.

ELRA Language Resources catalogue=20
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Since our last news, ELRA catalogue has grown up. At present, the catalogue
consists of:

1) Spoken resources : 34 databases (recordings from microphone, telephone,
continuous speech, isolated words, several languages, etc.).

2) Written resources :
* 13 monolingual and multilingual corpora
* 20 monolingual lexica
* Around 40 multilingual lexica
* A linguistic software platform and grammars development platform

3) Terminological resources : over 90 databases with a wide range of domains
and several languages (French, English, German, Spanish, Danish, Italian,
Catalan, Turkish, Polish, Portuguese).

Speech resources are listed below with a brief description. Written and
Terminological resources descriptions will be mailed in 2 further messages.

The *** indicates that resources are still negotiation.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
SPEECH AND RELATED RESOURCES
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

ELRA-S0001 ACCOR - Acoustic and articulatory multilingual database (7
languages) recorded as part of the ESPRIT- ACCOR project investigating
cross-language acoustic-articulatory correlations in coarticulatory
processes. Only English is available.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0002 AIDA-1 - This Italian corpus is made up of several sets of
phonetically dense meaningless words and digits from 0 to 9, recorded by 20
male & 20 female speakers (8 of them repeated the data 5 times).
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0003 BDLEX 23000 - A phonetically transcribed French lexicon of 23,000
canonical entries (leading to over 270,000 forms) with the corresponding
graphemical, phonological and morphosyntactical attributes.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0004 BDLEX 50000 - A phonetically transcribed French lexicon of 50,000
canonical entries (leading to over 450,000 forms) with the same information
as BDLEX-23000.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0005 BDSONS (Base de Donn=E9es des Sons du Fran=E7ais) - French &=
Canadian
French- Speech database with two subsets: evaluation (sentences, logatomes,
numbers, digits, etc.) & acoustic modelling (sequences of CVCV, various
types of sentences, etc.). The corpus consists of 16 male and 16 female
speakers.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0006 BREF sub-corpus BREF-80 - 5,330 sentences read by 80 French
speakers. Texts were selected from the French newspaper Le Monde (over
20,000 words).
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0007 BREF sub-corpus BREF-Polyglot - 3,193 sentences read by 6 French
speakers. The sentences were selected to cover a wide range of phonetic=
context.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0008 COLLECT - 500 speakers, half of whom called from Turin and the
other half from all over Italy, automatically prompted to utter the 10
Italian digits and 5 command words.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0009*** COST232 - Multi-English Speech database - 797 successful calls
received in Italy and the UK, using different types of collecting equipment.
Repetition of the same vocabulary - the "TI (Texas Instrument) words"
(digits + yes, no, go, etc.).
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0010 Dutch Polyphone Database - Telephone speech from 5,050 Dutch
speakers. Approx. 44 items per speaker. Read & spontaneous speech (isolated
words, digits, sentences, etc.).
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0011 English Polyphone Database (SpeechDat(M)) - DB1 (Phonetically
rich sentences & application oriented utterances such as keywords, digits,
etc.) - 1,000 speakers recorded over digital telephone lines using fixed
telephone sets.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0012 English Polyphone Database (SpeechDat(M)) - DB2 (The phonetically
rich sentences sub-set). see ELRA-S0011
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0013 Erlanger Bahnansage - ERBA - Over 10,000 utterances read by over
100 German speakers. Domain of train inquiries.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0014*** EUROM1 (The Multilingual European Speech Database) - The first
really multilingual speech database produced in Europe. Over 60 speakers per
language who pronounced numbers, sentences, isolated words, using close
talking microphone.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0015 EUROM1I - The Italian release of EUROM1
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0016 FRESCO - DB1 (Phonetically rich sentences & application oriented
utterances such as keywords, digits, etc.) - French SpeechDat (Polyphone)
database containing 35,000 utterances from 1,000 callers over the telephone
in France.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0017 FRESCO - DB2 (The Phonetically rich sentences sub-set) - see
ELRA-S0016
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0018 German Polyphone Database (SpeechDat(M)) - DB1 (Phonetically rich
sentences & application oriented utterances such as keywords, digits, etc.)
- German read and spontaneous speech from 1,000 speakers.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0019 German Polyphone Database (SpeechDat(M)) - DB2 (The phonetically
rich sentences sub-set) - see ELRA-S0018
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0020 GRONINGEN - Over 20 hours of Dutch read speech material (short
texts, short sentences, etc.), from 238 speakers.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0021 M2VTS Multi Modal Verification for Teleservices and Security
applications project - Multilingual database designed to facilitate access
control using multimodal identification of human faces (speech & images).
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0022*** Onomastica Multi-Language Pronunciation Dictionaries -
Covering city & town names, street names, family names, first names, product
names, for 11 European languages.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0023 PHONDAT1 - PD1 (2nd edition) - Read speech from 201 German
speakers who read 450 different sentences each. Eight of them read the whole
sentence corpus.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0024 PHONDAT2 - PD2 (2nd edition) - 200 different sentences from a
train inquiry task read by 16 German speakers, provided with phonological
segmentation by hand plus other labelling.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0025 SIEMENS 100 - SI100 - Approx. 100 sentences extracted from the
German newspaper S=FCdDeutsch Zeitungen and read by 101 speakers.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0026 SIEMENS 1000 - SI1000 - Approx. 1,000 sentences extracted from
the German newspaper S=FCdDeutsch Zeitungen and read by 10 speakers.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0027 SieTill (Siemens Tillman) Telephone Speech Database - German
database with 730 speakers (338 female, 392 male), and 36,000 utterances
(digit sequences, dates, spelled names, etc.).
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0028*** SIVA - Speech Database for Speaker Verification and
Identification - Over 2,000 calls in Italian language collected over the
telephone.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0029 Strange Corpus 1 - SC1 (Accents) - 'Nordwind und Sonne' story
read by 72 speakers with foreign accents and 16 native German speakers.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0030*** Swiss-French Polyphone Database - 5,000 speakers answered
around 10 questions leading to spontaneous speech and read about 28 items
from a form supplied by IDIAP.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0031 TED (Translanguage English Database) - Recordings made of 188
oral presentations in English given at Eurospeech'93 in Berlin (high
percentage of non native English speakers).
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0032 TEDPhone - Polyphone/SpeechDat-like recordings of 64 speakers in
English and in their native language.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0033 BDBRUIT - Recordings of French speech, corrupted with
perturbations due to noisy environments, especially the Lombard effect.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ELRA-S0034 VERBMOBIL (set of resources) - German spontaneous speech data
bases recorded in a dialogue task.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
We remind you that the catalogue of the LR negotiated or under negotiation
by ELRA can be found on our Web site. For quotation, please refer to this
site or directly to ELRA office.

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
For further information :

ELRA/ELDA
87, Avenue d'Italie
FR-75013 PARIS
FRANCE
Tel : +33 01 45 86 53 00
Fax : +33 01 45 86 44 88
E-mail : info-elra@calva.net
WWW: http://www.icp.grenet.fr/ELRA/home.html
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D