Aligned Turkish/English texts now available

Kemal Oflazer (ko@cs.bilkent.edu.tr)
Tue, 23 Jan 1996 16:01:25 +0300

Greetings,

Sample Turkish and English aligned texts are now available for general
use. They have been automatically aligned (by Kursat Ince) at the
sentence level using Gale and Church's align code ( Computational
Linguistics Vol 19 No 1 March 1993). There may be occasional problems
due to misidentification of sentence boundaries. Turkish has been coded
in all lower case with the 6 upper case ASCII characters (C,G, I, O,S,
U) representing the 6 non-ASCII Turkish characters.
Currently there are 6 parallel texts. Text 1 is a
foreign ministry press release, Texts 2 and 3 are the texts of two
treaties, Texts 4 - 6 are samples texts ocr'ed from
a journal on translation.

These can be accessed by WWW at
http://www.cs.bilkent.edu.tr/~ko/Turklang/corpus/par-corpus/

Any corrections and suggestions are welcome.

Kemal Oflazer
ko@cs.bilkent.cs.edu.tr