[Corpora-List] Parallel / Comparable / Translation

From: oliver@ccl.bham.ac.uk
Date: Thu Sep 12 2002 - 15:39:02 MET DST

  • Next message: jeisner@linc.cis.upenn.edu: "[Corpora-List] Call for Proposals: JHU Summer Workshop on Language Engineering"

    > I admit that the term "translation corpus" is confusing: you would rather
    > understand it as a "corpus of translations" than "corpus for translators" or
    > "used mainly by translators" (which is the right interpretation).

    I don't think there is a need for terms describing a corpus according to
    the expected users; who cares whether a corpus is used by translators,
    language teachers, or computational linguists?

    If it is "corpus of translations" then it is either
    - a "translation corpus" if it contains texts in one language and their
      translations in one (or possibly more) other languages. This could
      be viewed as a subtype of a parallel corpus, which doesn't have the
      requirement that its elements are translations of each other.
    OR
    - a corpus consisting of texts in one language, which are translations
      of some other texts (which are not in the corpus). This would be a
      specialised sample of a monolingual corpus, similar in principle to a
      corpus of newspaper articles, or some other externally specified text
      type/genre/...

    It doesn't make sense from a technical point of view to have a `mixed
    bag' of texts in different languages in one single corpus `lump', unless
    they're separate parts (as in a parallel/translation/comparable corpus).
    So, a corpus containing elements in more than one language should really
    only be either parallel or comparable, or should be a translation corpus
    if it is retained as an independent category and not just a subtype of
    the parallel corpus.

    [This, however, does not apply to archives or other collections, which can
    contain texts in whatever languages. But then, an archive is not a corpus.]

    Oliver

    -- 
     /\  \ lecturer | department of english | school of humanities
    //\\  \ the university of birmingham | edgbaston | birmingham b15 2tt
    \\//   \ united kingdom | phone +44(0)121-414-6206 | fax +44(0)121-414-5668
     \/     \ http://web.bham.ac.uk/o.mason/ | o.mason@bham.ac.uk
    



    This archive was generated by hypermail 2b29 : Fri Sep 13 2002 - 17:19:30 MET DST