Corpora: Q: Text length differences in parallel text

From: Ralf Steinberger (ralf.steinberger@jrc.it)
Date: Mon Oct 01 2001 - 17:32:36 MET DST

  • Next message: John Kirk: "Corpora: 2 ICE-Ireland Jobs in Belfast"

    Hello,

    we are interested in finding out about the average text length difference
    between texts and their translations (parallel texts). We would be
    interested in data for all eleven official European Union languages, but
    especially for the language pair English - Spanish. We want to use this (and
    further) information to automatically identify translations of a given text
    in a larger text collection.

    Text length differences could be expressed either by using the number of
    words or the number of characters. In our own sublanguage corpus, Spanish
    texts use about 13% more characters than their English equivalences, but we
    would like to have information pertaining to texts other than our own.

    Thanks in advance for any help with this. I shall send a summary of the
    responses to the list.

    Ralf

    Ralf Steinberger
    European Commission
    Joint Research Centre - Ispra site (http://www.jrc.it/langtech/)



    This archive was generated by hypermail 2b29 : Mon Oct 01 2001 - 16:25:59 MET DST