Corpora: multilingual texts

D C Souter (cs@scs.leeds.ac.uk)
Mon, 1 Dec 1997 09:41:37 GMT

I am posting this on behalf on one of my project students, please
reply to him at cazcp@scs.leeds.ac.uk

---------------------------------------

I am a final year undergraduate student at the University Of Leeds, and am
currently undertaking a final year project in the area of language
identification.

Specifically, given text containing more than one language, I am attempting to
a develop a system that will determine the boundaries between the languages,
where the german changes to french, for example.

Does anyone out there have any examples of, or suggestions where I can find
multilingual texts? In this context by multilingual I mean texts containing
more than one language, rather than texts that have been translated between
languages.

I realise that I could artificially create some multilingual texts, but I
would rather use some genuine examples as test data.

The languages I am dealing with are English, French, German, Dutch, Italian,
Spanish, Irish Gaelic, Serbo Croat and Portugese.

Also, if anyone knows of any papers or projects relating to this, could they
please let me know.

Could anyone with any information please reply directly to me, at:

cazcp@scs.leeds.ac.uk

Thankyou,

Chris Pyatt
School of Computer Studies
University of Leeds
Leeds LS2 9JT