Re: Corpora: multilingual texts

Ted E. Dunning (ted@aptex.com)
Tue, 2 Dec 1997 12:36:16 -0800

I did some work on language identification and have an evaluation
corpus available for anybody who wants to try their hand. This corpus
was developed by taking random samples from a Spanish/English parallel
corpus.

I include with the test corpus both a technical report (somewhat
outdated) and working code (also somewhat outdated).

You can ftp the 1995 version of the test corpus/paper/code from

ftp://crl.nmsu.edu/pub/misc/lingdet_suite.tar.gz

If you want the latest description and code, please email me.