Here is a short citation from Jennifer Pearson's "Terms in Context"
(Amsterdam 1998), p. 45:
-- Sinclair, who states that corpora can be divided into subcorpora, and that corpora and subcorpora can be divided into components, defines a subcorpus as having "all the properties of a corpus but happens to be part of a larger corpus" (1994a:4). Thus, a subcorpus must have all the properties of a larger corpus. We understand this to mean that it is representative of the larger corpus. A component, on the other hand, according to Sinclair, illustrates a particular type of language and is selected "according to a set of linguistic criteria that serve to characterize its linguistic homogeneity" (Sinclair 1994a:4). It differs from a subcorpus in that it is not intended to be representative of the corpus from which it is drawn and is therefore not necessarily an adequate sample of a language. --I did not go back to Sinclair ("Corpus Typology: A Framework for Classification", EAGLES 1994), but according to Pearson, "a subcorpus must have all the properties of a larger corpus", thus being representative of the larger corpus. Another question is how this can be achieved, although, it is, obviously, safer to state that a subcorpus is representative of the larger corpus, than argue that the larger corpus (and, consequently, the subcorpus) is representative of a language (or genre etc.). Anyways, using the terms defined above (without intention to agree fully with Pearson), the set of EAP texts detached from the BNC would probably be called a "component" rather than a "subcorpus". Personally I would like to call a "subcorpus" ANY corpus detached from another corpus - despite its content or composition. Whatever a set of texts is called, the question of representativeness remains. Here I agree with Ute Roemer, who wrote: "The important question in this context is 'What do you want to do with the (sub)corpus?'"
sincerely, Sampo
Ps. Please regard this as a note from a person who tends to consider the notion of "representative of a language" as an oxymoron, a "mission impossible".
( : ============================================= : )
Sampo Nevalainen, M.A. Researcher University of Joensuu Savonlinna School of Translation Studies P.O.Box 48 FIN-57101 Savonlinna FINLAND
tel +358-15-511 70 (operator) +358-15-511 7704 fax +358-15-515 096 email samponev@cc.joensuu.fi http://www.joensuu.fi/slnkvl/
This archive was generated by hypermail 2b29 : Thu Jan 03 2002 - 10:51:36 MET