Corpora: ELRA News

From: Magali Duclaux (
Date: Fri Jan 11 2002 - 11:43:11 MET

  • Next message: Tylman Ule: "Corpora: LREC 2002 Workshop on Wordnet Structures and Standardization"

    [Our apologies if you receive multiple copies of this announcement]

    ELRA - European Language Resources Association

    We are pleased to announce the new resources
    available in our catalogue of language resources:

    ELRA W0030 Arabic Data Set
    ELRA W0031 GeFRePaC - German French Reciprocal
    Parallel Corpus

    A short description of these two new resources is given
    Please visit the online catalogue to get further details:

    ELRA W0030 Arabic Data Set:
    The corpus contains Al-Hayat newspaper articles with
    value added for Language Engineering and Information
    Retrieval applications development purposes. Data has
    been organised in 7 subject specific databases according
    to the Al-Hayat subject tags. Mark-up, numbers, special
    characters and punctuation have been removed. The size
    of the total file is 268 MB. The dataset contains 18,639,264
    distinct tokens in 42,591 articles, organised in 7 domains.

    ELRA W0031 GeFRePaC - German French Reciprocal
    Parallel Corpus:
    GeFRePac was produced in the framework of the LRsP&P
    project. It contains 30 million words : 15 million for the
    German language, 15 million for the French language.
    It covers natural general language as used in
    public socio-political discourse and it has a focus on
    multilingual administration and commercial and legal
    documentation. It
    was created for the purpose of
    developing, enhancing and improving translation aids.

    For further information, please contact:

    55-57 rue Brillat-Savarin
    F-75013 Paris, France

    Tel:    +33 01 43 13 33 33
    Fax:    +33 01 43 13 33 30


    or visit our Web site:

    This archive was generated by hypermail 2b29 : Sun Jan 13 2002 - 13:48:24 MET