RE: Corpora: when does a subcorpus become a corpus

From: P. Kaszubski (przemka@amu.edu.pl)
Date: Sat Jan 05 2002 - 01:47:24 MET

  • Next message: COMP staff: "RE: Corpora: when does a subcorpus become a corpus"

    On 4 Jan 2002, at 11:58, Sampo Nevalainen wrote:

    > height of human beings on the basis of a basketball team. The problem with
    > language is that exceptions are often not evident and not easily detected
    > since there is no clear "reference set" for language. In principle, if
    > your findings are truly generalizable you should get similar results from
    > any corpus, although there is obviously more "noise" in more "general"
    > corpora. Am I right? Or am I pedant? Or both. ( About the "Terms in

    I think the similarity of results is deeply affected by corpus size and
    Zipfian distribution. Some interesting features will only show up
    when the (sub)corpora compared are large enough, and this is in
    turn dependent on the composition of the "general corpus" from
    which you may have retrieved them (if you have done so). Now
    we're back to the issue of how large a corpus, or subcorpus, or
    special corpus, should be in order to be representative not just of a
    given genre/variety etc. but also of the linguistic feature(s)
    investigated. Are 5 occurrences (in a million or less running words)
    enough? This is yet another contributing factor to the conclusion
    that in order to study sth in a corpus-based (or corpus-driven)
    manner, you need to first clearly define this "sth" and lay down your
    purpose.

    (Slightly tardy) Season's greetings to you and all "corporeans",

    Przemek

    =======================================
    Dr Przemyslaw Kaszubski
    t: +48 61 8293515
    e: przemka@amu.edu.pl
    w: http://elex.amu.edu.pl/ifa/staff/kaszubski.html

    (ENGLISH) LEARNER CORPORA PAGE:
    http://main.amu.edu.pl/~przemka

    COMPREHENSIVE CORPORA BIBLIOGRAPHY:
    http://main.amu.edu.pl/~przemka/welcome.html#Corpbibl

    School of English
    Adam Mickiewicz University
    Al. Niepodleglosci 4
    61-874 Poznan
    t: +48 61 8293506
    f: +48 61 8523103
    w: http://elex.amu.edu.pl/ifa
    =======================================



    This archive was generated by hypermail 2b29 : Sat Jan 05 2002 - 01:48:58 MET