Corpora: ELRA News 1/2

From: Valerie Mapelli (mapelli@elda.fr)
Date: Tue Jun 13 2000 - 16:21:32 MET DST

  • Next message: Valerie Mapelli: "Corpora: ELRA News 2/2"

    [ We apologise for the duplicate posting of this announcement ]
    ___________________________________________________________
                                    ELRA
                    European Language Resources Association
                                   ELRA News
    ___________________________________________________________

                         *** ELRA NEW RESOURCES ***

    We are happy to announce new resources available via ELRA:

    ELRA-W0024 PAROLE Portuguese Corpus
    ELRA-L0035 PAROLE Portuguese Lexicon

    A description of each database is given below.

    _______________________________________
    ELRA-W0024 PAROLE Portuguese Corpus
    _______________________________________

    The parole Portuguese corpus contains approximately 3 million
    running words of European Portuguese distributed by Medium,
    as follows:
    - Newspaper: about 65%, covering the period 1996-1997 of 3 titles;
    - Book: about 20%, concerning 12 titles from 3 editing houses;
    - Periodical: about 5%, concerning 7 weekly issues of 1 title, 1996;
    - Miscellaneous: about 10%, concerning several files distributed by 8 titles.
    The corpus was classified and encoded according to the common
    core parole encoding standard. The file format of this corpus is SGML.

    A subcorpus of the PAROLE Portuguese Corpus, which reproduces
    approximately the whole Corpus distribution by Medium
    (Newspaper: about 65%, Book: ab. 20%, Periodical: ab. 5%,
    Miscellaneous: ab. 10%) is also available.
    It has about 250,000 words morpho-syntactically tagged accordingly
    to the parole common tagset and morpho-syntactic annotation standards.
    Disambiguation was manually checked.

    _______________________________________
    ELRA-L0035 PAROLE Portuguese Lexicon
    _______________________________________

    The PAROLE Portuguese Lexicon is constituted by 20 thousand
    entries morpho-syntactically and syntactically encoded, accordingly
    to the parole common encoding standards. The data is in SGML format.

    =====================================
    For further information, please contact:

         ELRA/ELDA Tel +33 01 43 13 33 33
         55-57 rue Brillat-Savarin Fax +33 01 43 13 33 30
         F-75013 Paris, France E-mail mapelli@elda.fr

    or visit the online catalogue on our Web site:

         http://www.icp.grenet.fr/ELRA/home.html
         or http://www.elda.fr
    =====================================



    This archive was generated by hypermail 2b29 : Tue Jun 13 2000 - 16:21:33 MET DST