Corpora: Corpora Q: Text length differences in parallel text

From: araceli.alonso@iula.upf.es
Date: Mon Oct 15 2001 - 15:47:45 MET DST

  • Next message: Abdel-Hamid Elewa: "Corpora: Classical Arabic Corpus"

    Dear Mr. Steinberger:

    I write to you on behalf of Dr. Lluís de Yzaguirre from the Institute
    for Applied Linguistics (Institut Universitari de Lingüística Aplicada)
    at the University Pompeu Fabra as we are working with parallel texts in
    different languages (English, Spanish and Catalan).
    At the moment we are developing a text aligning system. Most aligners
    are based on statistics and there are usually many problems when the
    texts to be aligned are quite complex or not literally translated. We
    have developed a system that benefits also from corpus processing, that
    is, it is not only based on statistics. If you are interested in the
    technique developed to create the system, you can find more information
    at http://terminotica.upf.es/CREL/atenes.ps.
    Also at the following address
    http://terminotica.upf.es/academic/ENES/Default.htm, you will find an
    example of aligned texts in English-Spanish*. The texts have been
    extracted from the book Capitalism, socialism and democracy by Joseph
    Alois Schumpeter and its translation into Spanish . The English text has
    72,621 words and the Spanish one has 93,858 words. This sample is not
    meaningful but at the moment the system allows 100% sentence alignment
    and 70% lexical alignment.
    The last version of the tests we are doing will be presented in fifteen
    days at a Congress on Contrastive Linguistics at Santiago de Compostela.
    If you are interested we can send you the communication after the
    congress.

    If you need any more information, please do not doubt in contact us.
    Yours sincerely

    Araceli Alonso
    Institut Universitari de Lingüística Aplicada

    *It is also available to see an example of aligned texts in other
    languages, English-Catalan, Catalan-Spanish at the following addres:
    http://terminotica.upf.es/academic/



    This archive was generated by hypermail 2b29 : Tue Oct 16 2001 - 09:24:34 MET DST