Re: Corpora: when does a subcorpus become a corpus

From: P. Kaszubski (
Date: Sat Dec 29 2001 - 01:00:56 MET

  • Next message: Yuri Tambovtsev: "Corpora: AmerIndian:Sweet Grass Cree"


    The matters of representativeness are vital for corpus research, as
    we all know only too well. I have been doing quite a bit of "corpora
    trimming" for my own comparative purposes when trying to obtain
    the most representative results for my phraseological studies with
    learner corpora. Much depends of course on how clearcut are the
    boundaries of the genre under analysis. My own observation has
    been that it is sometimes better to ease off the chase after
    "comparability" between corpora, because chances are that we will
    never be exactly satisfied with the degree of match, and
    consequently with the statistics derived. It makes better sense to
    me to try to obtain a few corpora (or sub-corpora) claiming to
    represent the same or similar enough genre and perform multiple
    rather than bilateral statistical analysis.

    Having said this, I do of course declare myself an advocate of
    careful, principled and well documented corpus compilation in the
    first place. The more we know about the EAP part of the BNC the
    better we can design our tests. BTW, is the notion of SUBCORPUS
    discussed theoretically and /or defined anywhere - does it
    necessarily carry with it the same value of representativeness (as
    defined e.g. in Sinclair's 3C book, or the BNC handbook - in
    contrast to opportunistic text collections)? How organised need we
    be when extracting texts from a corpus to be able to call the result a
    "subcorpus" - it just struck me as an interesting question.


    P. Kaszubski

    Dr Przemyslaw Kaszubski
    t: +48 61 8293515


    School of English
    Adam Mickiewicz University
    Al. Niepodleglosci 4
    61-874 Poznan
    t: +48 61 8293506
    f: +48 61 8523103

    This archive was generated by hypermail 2b29 : Sat Dec 29 2001 - 01:03:23 MET