Re: PD: [Corpora-List] Date: Wed, 11 Sep 2002 15:16:20 +0200

From: Sampo Nevalainen (samponev@cc.joensuu.fi)
Date: Thu Sep 12 2002 - 16:26:54 MET DST

  • Next message: Magali Jeanmaire: "[Corpora-List] ELRA news"

    The whole terminology of corpus linguistics is admittedly pretty
    anarchistic... and I can't help messing it up a little more :-)

    Personally I hardly use the term "translation corpus". For the name only
    suggests that the corpus consists of translations, so, in principle, a
    translation corpus could be monolingual, bilingual or multilingual, and
    contain just anything that fits under the notion of "translation"... (I am
    not going to speculate here what is a translation... there are different
    opinions about it, as well). By "parallel corpus" I mean a bi- or
    multilingual corpus of originals and their translations into one or more
    languages. Almost synonymous expressions for "parallel" are words like
    "collateral", "concurrent" and "simultaneous". This implies that the texts
    are quite strictly related to each other - in a sense one could say that
    they mirror each other. So, for me it makes sense to use this term with
    originals and their translations, which exist interdependently (although,
    of course, the process of production is usually not simultaneous, what
    comes to written language -- but cf. simultaneous interpreting...) A
    "comparable corpus", then, is, as the name suggests, a corpus consisting of
    (pairs or groups of) texts produced independently of each other, but that
    are considered to be comparable in certain aspects. For example, in our
    university we have compiled a comparable corpus, consisting of translated
    and non-translated Finnish from different genres - but the translated texts
    are _not_ translations of the texts originally written in Finnish. That is,
    the corpus as a whole is a comparable corpus of two language variants.
    (Hmm... perhaps one might call the translational part of the corpus a
    "translation corpus"..?) With the same logic, a combined corpus of the
    Brown and LOB corpora could be called a comparable corpus (of American and
    British English). And, similarily, we could have a comparable corpus of a
    certain special field or domain in different languages. Couldn't it be
    easier..?

    sampo

    ( : ============================================= : )

    Sampo Nevalainen, M.A.
    Researcher
    University of Joensuu
    Savonlinna School of Translation Studies
    P.O.Box 48
    FIN-57101 Savonlinna
    FINLAND

    tel +358-15-511 70 (operator)
             +358-15-511 7704
    fax +358-15-515 096
    email samponev@cc.joensuu.fi
    http://www.joensuu.fi/slnkvl/



    This archive was generated by hypermail 2b29 : Thu Sep 12 2002 - 16:40:49 MET DST