[Corpora-List] ELRA News 1/2

From: Magali Jeanmaire (duclaux@elda.fr)
Date: Wed Apr 16 2003 - 17:02:33 MET DST

  • Next message: Magali Jeanmaire: "[Corpora-List] ELRA News 2/2"

    ****************************************************************
    ELRA is happy to announce that new resources are
    available in its catalogue of language resources
    ****************************************************************
    You will find below the short descriptions of these new
    resources. We invite you to visit the on-line catalogue
    on our web site, at http://www.elda.fr or http://www.elra.info,
    to get more detailed descriptions.

    Please contact us if you would like to get more information.
    ****************************************************************
    Spoken Language Resources:

    - S0144 Italian SpeechDat-Car
    - S0113 Spoken Dutch Corpus: release 6

    AURORA Databases

    - Subset of Italian SpeechDat-Car database (AURORA/CD0003-05)
    - Aurora 4a and Aurora 4b databases

    ****************************************************************
    *** S0144 Italian SpeechDat-Car ***
    The Italian SpeechDat-Car database contains the recordings in a car
    of 300 Italian speakers, who uttered around 120 read and spontaneous
    items. Recordings have been made through 5 different channels, of which
    4 were in-car microphones (1 close-talk microphone, 3 far-talk microphones)
    and 1 channel over the GSM network.

    *** S0113 Spoken Dutch Corpus: Release 6 ***
    Release 6 of the Spoken Dutch Corpus was published.
    Sound files together with their orthographic transcripts are included
    in this release, as well as various annotations, including e.g. POS tagging,
    lemmatization, word segmentation, etc.

    *** Subset of Italian SpeechDat-Car database (AURORA/CD0003-05) ***
    The Aurora project was originally set up to establish a world wide standard
    for the feature extraction software which forms the core of the front-end of
    a DSR (Distributed Speech Recognition) system. ETSI formally adopted this
    activity as work items 007 and 008.The two work items within ETSI are:
    - ETSI DES/STQ WI007: Distributed Speech Recognition - Front-End Feature
    Extraction Algorithm & Compression Algorithm
    - ETSI DES/STQ WI008: Distributed Speech Recognition - Advanced Feature
    Extraction Algorithm.

    This database is a subset of the Italian SpeechDat-Car database which has
    been collected as part of the European Union funded SpeechDat-Car project.
    It contains contains 2200 Italian connected digit utterances divided into
    training
    and testing utterances in the following noise and driving conditions inside
    a car:
    - High speed good road
    - Low speed rough road
    - Stopped with motor running
    - Town traffic

    *** Aurora 4a & 4b ***
    The Aurora project is now releasing a number of list files for performing the
    training and testing on the Wall Street Journal (WSJ0) data at two sampling
    rates -8 kHz and 16 kHz. The Aurora 4a database is based on the WSJ0
    with artificial addition of noise over a range of signal to noise ratios.
    It contains
    both clean and multicondition training sets and 14 evaluation sets with
    different
    noise types and microphones.
    An additional database, Aurora 4b, will be released later, that will
    contain noisy
    versions of the Nov'92 WSJ0 development set.



    This archive was generated by hypermail 2b29 : Wed Apr 16 2003 - 17:01:35 MET DST