Corpora: ELRA News

From: Valerie Mapelli (mapelli@elda.fr)
Date: Tue Apr 11 2000 - 11:07:50 MET DST

  • Next message: Angela Hahn: "Corpora: Job offer"

    [ We apologise for the duplicate posting of this announcement ]
    ___________________________________________________________
                                    ELRA
                    European Language Resources Association
                                   ELRA News
    ___________________________________________________________

                         *** ELRA NEW RESOURCES ***

    We are happy to announce new resources available via ELRA:

    ELRA-S0058 RVG1 (Regional Variants of German 1)
    ELRA-S0081 Norwegian SpeechDat(II) FDB-1000
    ELRA-S0082 Siemens Synthesis Corpus - SI1000P
    ELRA-W0020 PAROLE French Corpus
    ELRA-W0022 ILSP/ELEFTHEROTYPIA Corpus (Greek corpus)
    ELRA-L0033 LusoLEX European Portuguese Lexicon
    ELRA-L0034 BrasiLEX Brazilian Portuguese lexicon

    A short description of each database is given below.

    _______________________________________
    ELRA-S0058 RVG1 (Regional Variants of German 1)
    _______________________________________

    We would like to inform you that the ELRA-S0058 RVG1
    has been extended by 421 speakers, recorded through
    high quality microphones. More information about this
    database is available on the ELRA Web site.

    _______________________________________
    ELRA-S0081 Norwegian SpeechDat(II) FDB-1000
    _______________________________________

    The Norwegian SpeechDat(II) FDB-1000 comprises 1016
    Norwegian speakers (517 males, 499 females) recorded over
    the Norwegian fixed telephone network. The SpeechDat database
    has been collected and annotated by Telenor Research and
    Development. The FDB-1000 database is partitioned into 4 CDs.
    The speech databases made within the SpeechDat(II) project were
    validated by SPEX, the Netherlands, to assess their compliance
    with the SpeechDat format and content specifications.
    Speech samples are stored as sequences of 8-bit 8 kHz A-law.
    Each prompted utterance is stored in a separate file. Each signal
    file is accompanied by an ASCII SAM label file which contains the
    relevant descriptive information. A pronunciation lexicon with a
    phonemic transcription in SAMPA is also included.

    _______________________________________
    ELRA-S0082 Siemens Synthesis Corpus - SI1000P
    _______________________________________

    The SI1000P recordings were done to provide material for high
    quality concatenate speech synthesis. It contains 1000 newspaper
    sentences read by two German professional broadcasting announcers
    in studio quality together with the laryngographic signal and the glottal
    pulse stream. Parts of the corpus were labelled and segmented
    phonemically (SAM-PA) and prosodically (borders + accents).

    _______________________________________
    ELRA-W0020 PAROLE French Corpus
    _______________________________________

    The PAROLE French corpus contains a total of 20 093 099 words, that
    include the following data:
    Miscellaneous: (CRATER, MLCC Multilingual and Parallel Corpora): 2 025 964
    words
    Books: CNRS Editions: 3 267 409 words
    Periodicals: CNRS Info, Hermès: 942 963 words
    Newspapers: Le Monde, provided by ELRA: 13 856 763 words

    The resulting resources are conformant to the PAROLE format.

    _____________________________________
    ELRA-W0022 ILSP/ELEFTHEROTYPIA Corpus (Greek corpus)
    _______________________________________

    This corpus contains approximately 3 million words from the daily
    newspaper ELEFTHEROTYPIA, classified and annotated accordingly to
    the common core PAROLE encoding standard. The format of the corpus
    is SGML files. A subset of the corpus (250,000 words) is
    morpho-syntactically tagged; all the words are also lemmatised and checked.

    _______________________________________
    ELRA-L0033 LusoLEX European Portuguese Lexicon
    _______________________________________

    Multifunctional monolingual lexicon of the European variety of Portuguese,
    consisting of about 61,000 entries (lemmas) and 1,600 correspondent
    inflexion paradigms. The set of entries includes compound words and
    the inflexion paradigms include information regarding enclitics,
    augmentatives and diminutives. Morphological information is encoded
    with maximum granularity and is conformant with the EAGLES recommendations.

    _______________________________________
    ELRA-L0034 BrasiLEX Brazilian Portuguese lexicon
    _______________________________________

    Multifunctional monolingual lexicon of the Brazilian variety of Portuguese,
    consisting of about 65,000 entries (lemmas) and 1,600 correspondent
    inflexion paradigms. The set of entries includes compound words and
    the inflexion paradigms include information regarding enclitics and
    augmentative/diminutive degree. Morphological information is encoded
    with maximum granularity and is conformant with the EAGLES recommendations.

    =====================================
    For further information, please contact:

         ELRA/ELDA Tel +33 01 43 13 33 33
         55-57 rue Brillat-Savarin Fax +33 01 43 13 33 30
         F-75013 Paris, France E-mail mapelli@elda.fr

    or visit the online catalogue on our Web site:

         http://www.icp.grenet.fr/ELRA/home.html
         or http://www.elda.fr
    =====================================



    This archive was generated by hypermail 2b29 : Tue Apr 11 2000 - 11:07:51 MET DST