RE: Corpora: when does a subcorpus become a corpus

From: Sampo Nevalainen (samponev@cc.joensuu.fi)
Date: Fri Jan 04 2002 - 13:59:34 MET

  • Next message: P bI K O B_ B.B.: "RE: Corpora: when does a subcorpus become a corpus"

    At 14:44 4.1.2002 +0300, P bI K O B_ B.B. wrote:
    >I am afraid that my opinion is different.
    >If I have any special corpus - Russian newspaper prose, Mexican proverbs
    >or German political metaphors - then - any my results based on these
    >corpora would be true for the language of Russian newspapers, Mexican
    >proverbs etc ONLY.
    >But - my results - any observations on any speech phenomenon based on
    >general properly compiled corpus would be true for the language IN GENERAL.

    Dear Vladimir, I see no difference in your and my opinions, except that I
    doubted that general corpora do really exist. Here I mean by 'general
    corpus' a corpus you could use for any linguistic purposes, that is a
    corpus supposed to be representative of a language in general.

    It is clear that the more restricted or more specialized your corpus is,
    the less generalizable to language as a whole your results are, obviously,
    because the (usually imaginary) total populations are different (smaller
    and more clear-cut for more specialized corpora). Without doubt, not
    everything that is true for a corpus of Russian newspaper prose is true for
    the Russian language as a whole, but they still have something in common.
    The advantage of a specialized corpus is that the (special) features you
    are interested in are more evident there. However, suppose that any
    "smaller population" (i.e. sample, or corpus) is a part of the "totality"
    (the language) that cannot be achieved by any means. Thus, if you are going
    to say something about the totality, the findings should be - more or less
    easily - observable also in any part of it, in any sample (more or less
    specialized corpora). So, I guess (almost) anything that is true for a
    "general corpus" should be true for more specialized corpora as well, if
    you consider it a feature of a language. (But again, obviously, not the
    other way round.) The point of this fuzzy writing is that one can get the
    picture about language only through cumulative evidence gathered from
    different sources, i.e. from (more or less specialized) corpora. And still
    this picture will be skewed and scrimpy, since we do not even know exactly
    what we are looking for; we do not know when the picture is complete (if
    ever). Well, maybe I should have started to study philosophy instead of
    corpus linguistics...

    sincerely,
    sampo

    ( : ============================================= : )

    Sampo Nevalainen, M.A.
    Researcher
    University of Joensuu
    Savonlinna School of Translation Studies
    P.O.Box 48
    FIN-57101 Savonlinna
    FINLAND

    tel +358-15-511 70 (operator)
             +358-15-511 7704
    fax +358-15-515 096
    email samponev@cc.joensuu.fi
    http://www.joensuu.fi/slnkvl/



    This archive was generated by hypermail 2b29 : Fri Jan 04 2002 - 14:04:51 MET