[Our apologies if you receive multiple copies of this announcement]
************************************************************
ELRA - European Language Resources Association
************************************************************
We are pleased to announce the new resources
available in our catalogue of language resources:
ELRA W0030 Arabic Data Set
ELRA W0031 GeFRePaC - German French Reciprocal
Parallel Corpus
A short description of these two new resources is given
below.
Please visit the online catalogue to get further details:
http://www.elda.fr/catalog.html
ELRA W0030 Arabic Data Set:
The corpus contains Al-Hayat newspaper articles with
value added for Language Engineering and Information
Retrieval applications development purposes. Data has
been organised in 7 subject specific databases according
to the Al-Hayat subject tags. Mark-up, numbers, special
characters and punctuation have been removed. The size
of the total file is 268 MB. The dataset contains 18,639,264
distinct tokens in 42,591 articles, organised in 7 domains.
ELRA W0031 GeFRePaC - German French Reciprocal
Parallel Corpus:
GeFRePac was produced in the framework of the LRsP&P
project. It contains 30 million words : 15 million for the
German language, 15 million for the French language.
It covers natural general language as used
in
public socio-political discourse and it has a focus on
multilingual administration and commercial and legal
documentation. It was created for the purpose of
developing, enhancing and improving translation aids.
=====================================
For further information, please contact:
ELRA/ELDA
55-57 rue Brillat-Savarin
F-75013 Paris, France
Tel:
Fax:
E-mail mapelli@elda.fr
or visit our Web site:
http://www.icp.grenet.fr/ELRA/home.html
or http://www.elda.fr
=====================================
This archive was generated by hypermail 2b29 : Sun Jan 13 2002 - 13:48:24 MET