Corpora: ELRA News

From: Valerie Mapelli (mapelli@elda.fr)
Date: Wed Jan 19 2000 - 16:37:36 MET

  • Next message: Hamish Cunningham: "Corpora: LREC Workshop on meta-descriptions and annotation schemes for multimedia Language Resources"

    [ We apologise for the duplicate posting of this announcement ]

    ___________________________________________________________
                                    ELRA
                    European Language Resources Association
                                   ELRA News
    ___________________________________________________________

                         *** ELRA NEW RESOURCES ***

    We are happy to announce a new resource available via ELRA:
            _______________________________________
            ELRA-S0076 French SpeechDat(II) FDB 5000
            _______________________________________

            The French SpeechDat(II) FDB-5000 comprises 5040
            French speakers recorded over the French fixed telephone
            network. 40 speakers have been added to the original 5,000
            speakers to fit the requirements of the database. This
            database is partitioned into 18 CDs, each of which comprises
            300 speakers sessions (except for CD 4, with 100 speakers
            sessions). The speech databases made within the SpeechDat(II)
            project were validated by SPEX, the Netherlands, to assess
            their compliance with the SpeechDat format and content
            specifications.

            The speech files are stored as sequence of 8-bit, 8kHz A-law speech files
            and are not compressed. Each prompt utterance is stored within a separate
            file and has an accompanying ASCII SAM label file.

            The following items were recorded:
            - 5 application words;
            - 1 sequence of 10 isolated digits;
            - 4 connected digits: 1 sheet number (5+ digits), 1 telephone number
            (9-11 digits), 1 credit card number (14-16 digits), 1 PIN code (6 digits);
            - 3 dates: 1 spontaneous date (e.g. birthday), 1 prompted date (word
            style), 1 relative and general date expression;
            - 2 word spotting phrases using an application word (embedded);
            - 1 isolated digit;
            - 3 spelled-out words (letter sequences): 1 spontaneous, e.g. own
            forename; 1 spelling of directory assistance city name; 1 real/artificial
            name for coverage;
            - 1 currency money amount;
            - 1 natural number;
            - 5 directory assistance names + 1 spelled-out name: 1 spontaneous,
            e.g. own forename, 1 city of birth / hometown (spontaneous); 1 most
            frequent city (out of 500); 1 most frequent company/agency (out of 500);
            1 “forename surname”, 1 spelled-out city of birth;
            - 2 questions, including "fuzzy" yes/no: 1 predominantly "yes" question,
            1 predominantly "no" question;
            - 9 phonetically rich sentences;
            - 2 time phrases: 1 time of day (spontaneous), 1 time phrase (word style);
            - 8 phonetically rich words.

            The following age distribution has been obtained: 215 speakers are below
            16 years old, 2531 speakers are between 16 and 30, 1208 speakers are
            between 31 and 45, 910 speakers are between 46 and 60, and 176 speakers
            are over 60.

            A pronunciation lexicon with a phonemic transcription in SAMPA is also
    included.

    =====================================
    For further information, please contact :

         ELRA/ELDA Tel : +33 01 43 13 33 33
         55-57 rue Brillat-Savarin Fax : +33 01 43 13 33 30
         F-75013 Paris, France E-mail : mapelli@elda.fr

    or visit our Web site:

         http://www.icp.grenet.fr/ELRA/home.html
    =====================================



    This archive was generated by hypermail 2b29 : Wed Jan 19 2000 - 16:40:16 MET