Re: Corpora: Arabic and Natural Language processing

Chris Brew (Chris.Brew@edinburgh.ac.uk)
Mon, 22 Sep 1997 09:14:40 +0100 (BST)

>Dear Friends,
> I am a M.Sc. in computer science student and a vendor at
>IBM-Egypt. As a student interesting in NLP and as Egyptian his mother tongue
>is Arabic and intersting in it, i found that one of the main obstacles in
>achieving real progress in this field is avaliablity of Electronic Arabic
>text.
>I ask every one interested in discussing this subject to a side conversation
>in this topic .
>I havesome ideas i want to share with you.
>
>
> Mohamed Farouk Noamany
> Mohamed Farouk Noamany
>

from the language technology FAQ

(http://www.ltg.ed.ac.uk/helpdesk/faq/)

Texts/Corpora: No 0050

Index of Key Terms

Can you tell me where to find Arabic texts?

The largest Arabic corpus available is the Al-Hayat 1995 CD (for the Mac).
It has some 140MB of data (about 23M words) in about 44,000 files, all
in Arabic Mac encoding (a superset of ISO 8859-6). It is available from:

Dr. Imad Bachir
Al-Hayat Publishing Company
Kensington Centre
66 Hammersmith Road
LONDON W14 8YT
+44 (0) 171 602 9988 (Tel);
+44 (0) 171 602 4963 (Fax)
ibachir@alhayat.com

Also, Khalid Choukri (elra@calvanet.calvacom.fr) suggests:

You should contact either Fathi Debili from the French Research Center
(debili@msh-paris.fr), or Ms. Nadia Hegazi from ERI - CAIRO
(nadia@eri.sci.eg).

Last edited by Colin Matheson, 10-07-97

Email: Chris.Brew@edinburgh.ac.uk
Address: Language Technology Group, HCRC,
2 Buccleuch Place, Edinburgh EH8 9LW,Scotland
Telephone: +44 131 650 4632 Fax: +44 131 650 4587