Re: Corpora: when does a subcorpus become a corpus

From: Sampo Nevalainen (
Date: Thu Jan 03 2002 - 10:36:04 MET

  • Next message: P bI K O B_ B.B.: "Re: Corpora: when does a subcorpus become a corpus"

    Here is a short citation from Jennifer Pearson's "Terms in Context"
    (Amsterdam 1998), p. 45:

    Sinclair, who states that corpora can be divided into subcorpora, and that 
    corpora and subcorpora can be divided into components, defines a subcorpus 
    as having "all the properties of a corpus but happens to be part of a 
    larger corpus" (1994a:4). Thus, a subcorpus must have all the properties of 
    a larger corpus. We understand this to mean that it is representative of 
    the larger corpus. A component, on the other hand, according to Sinclair, 
    illustrates a particular type of language and is selected "according to a 
    set of linguistic criteria that serve to characterize its linguistic 
    homogeneity" (Sinclair 1994a:4). It differs from a subcorpus in that it is 
    not intended to be representative of the corpus from which it is drawn and 
    is therefore not necessarily an adequate sample of a language.

    I did not go back to Sinclair ("Corpus Typology: A Framework for Classification", EAGLES 1994), but according to Pearson, "a subcorpus must have all the properties of a larger corpus", thus being representative of the larger corpus. Another question is how this can be achieved, although, it is, obviously, safer to state that a subcorpus is representative of the larger corpus, than argue that the larger corpus (and, consequently, the subcorpus) is representative of a language (or genre etc.). Anyways, using the terms defined above (without intention to agree fully with Pearson), the set of EAP texts detached from the BNC would probably be called a "component" rather than a "subcorpus". Personally I would like to call a "subcorpus" ANY corpus detached from another corpus - despite its content or composition. Whatever a set of texts is called, the question of representativeness remains. Here I agree with Ute Roemer, who wrote: "The important question in this context is 'What do you want to do with the (sub)corpus?'"

    sincerely, Sampo

    Ps. Please regard this as a note from a person who tends to consider the notion of "representative of a language" as an oxymoron, a "mission impossible".

    ( : ============================================= : )

    Sampo Nevalainen, M.A. Researcher University of Joensuu Savonlinna School of Translation Studies P.O.Box 48 FIN-57101 Savonlinna FINLAND

    tel +358-15-511 70 (operator) +358-15-511 7704 fax +358-15-515 096 email

    This archive was generated by hypermail 2b29 : Thu Jan 03 2002 - 10:51:36 MET