[Corpora-List] C-ORAL-ROM spoken corpus

From: Jean Veronis (Jean.Veronis@up.univ-mrs.fr)
Date: Thu Jan 20 2005 - 21:47:37 MET

  • Next message: Aristomenis Thanopoulos: "RE: [Corpora-List] semantic similarity"

    The C-ORAL-ROM corpus is available at ELRA/ELDA.

    C-ORAL-ROM is a multilingual corpus of spontaneous speech for four
    romance languages (French, Italian, Portuguese, Spanish) of around
    1,200,000 words (IST 2000-26228). The corpus consists of four
    comparable recording collections of Italian, French, Portuguese and
    Spanish spontaneous speech sessions (around 300,000 words for each
    Language). The collections are delivered respectively by the following
    providers:

        * Università di Firenze (Dipartimento di Italianistica, LABLITA);
        * Université de Provence (DELIC team, Description Linguistique
          Informatisée sur Corpus);
        * Fundação da Universidade de Lisboa/Centro de Linguística da
          Universidade de Lisboa
        * Universidad Autónoma de Madrid (Departamento de Lingüística,
          Lenguas Modernas, Lógica y F. de la Ciencia, Laboratorio de
          Lingüística Informática).

    The C-ORAL-ROM corpus provides the acoustic source of each session
    together with the following main annotations:

        * The orthographic transcription, in CHAT format, enriched with the
          tagging of terminal and non terminal prosodic breaks
        * Session metadata
        * The text to speech synchronization, in WIN PITCH CORPUS format,
          based on the alignment of each transcribed utterance. The WIN
          PITCH CORPUS software is provided with the ressource.

    More details in the ELRA/ELDA Catalogue:

    http://www.elda.org/catalogue/en/speech/S0172.html

    -- 
    Jean Véronis
      Home: http://www.up.univ-mrs.fr/veronis
      Blog: http://aixtal.blogspot.com
     
    



    This archive was generated by hypermail 2b29 : Thu Jan 20 2005 - 21:40:51 MET