I'm not aware of any generally available spiders that have built-in
language identification. However, what you're interested in has been
done at New Mexico State University. See:
Jim Cowie, Evgeny Ludovik, and Ron Zacharski, "An Autonomous,
Web-based, Multilingual Corpus Collection Tool", Proceedings of the
International Conference on Natural Language Processing and Industrial
Applications. 1998. <http://crl.nmsu.edu/~raz/langrec/nlpia.htm>
Their work did include Arabic as one of the languages.
Best,
Philip
----------------------------------------------------------------
Philip Resnik, Assistant Professor
Department of Linguistics and Institute for Advanced Computer Studies
1401 Marie Mount Hall UMIACS phone: (301) 405-6760
University of Maryland Linguistics phone: (301) 405-8903
College Park, MD 20742 USA Fax : (301) 405-7104
http://umiacs.umd.edu/~resnik E-mail: resnik@umiacs.umd.edu