[Corpora-List] ELRA News

From: Magali Jeanmaire (duclaux@elda.fr)
Date: Tue Jun 08 2004 - 16:50:20 MET DST

  • Next message: Siew Imm Tan: "[Corpora-List] Analysing Reuters Corpus Using Wordsmith Version 3"

    **********************************************************
    ELRA - Language Resources Catalogue - Update
    *********************************************************
    We are happy to announce that new Language Resources are
    now available in our catalogue:

    Short descriptions of these resources are given below.
    More detailed descriptions are available on our web sites,
    at www.elda.fr or www.elra.info.
    -------------------------------------------
    Written Language Resources
    -------------------------------------------
    *** W0015 Le Monde Text Corpus - Update ***
    Electronic archiving of "Le Monde" articles started on 1 January 1987.
    The entire corpus is available in an ASCII text format.
    Year 2003 is available in .XML format.

    *** W0036/04 Le Monde Diplomatique Text corpus in Arabic ***
    Electronic archiving of "Le Monde Diplomatique" articles in Arabic from 1998.
    The corpus is available in an ASCII text format.
    French and English versions also available.

    -------------------------------------------
    Spoken Language Resources
    -------------------------------------------
    *** S0158 Turkish OrienTel database ***
    This speech database contains the recordings of 1,700 Turkish speakers
    recorded over the Turkish fixed and mobile telephone network.
    Each speaker uttered around 45 read and spontaneous items.

    *** S0159 German spoken by Turkish OrienTel database ***
    This speech database contains the recordings of 332 Turkish speakers
    of German recorded over the German fixed and mobile telephone network.
    Each speaker uttered around 53 read and spontaneous items.

    *** S0160 Spanish Speecon database ***
    The Spanish Speecon database comprises the recordings of 561 adult
    Spanish speakers and 55 child Spanish speakers who uttered respectively
    over 290 items and 210 items (read and spontaneous).

    *** S0161 Russian Speecon database ***
    The Russian Speecon database comprises the recordings of 550 adult
    Russian speakers and 50 child Russian speakers who uttered respectively
    over 290 items and 210 items (read and spontaneous).

    *** S0162 Hempel ***
    This corpus contains 25.5 hours of recordings by 3,909 German speakers
    with a total of 184,240 spoken words, made via public phone lines (fixed
    network only). The contents are free monologues answering the question:
    "Was haben Sie in der letzten Stunde gemacht?" (What did you do within
    the last hour?). The database is conformant with the SpeechDat Exchange
    Format.



    This archive was generated by hypermail 2b29 : Tue Jun 08 2004 - 17:01:16 MET DST