Re: Corpora: Corpus size

From: Norbert Schlueter (nosch@zedat.fu-berlin.de)
Date: Sun Jun 03 2001 - 14:32:15 MET DST

  • Next message: geoffrey.williams: "Corpora: Journée Linguistique de Corpus - Appel à communications"

    Dear all,

    size, i.e. number of words, is obviously not the only factor when
    compiling a corpus for special investigations. Far more important
    seems to be to get at least 400 cases of whatever you are looking for.
    It can be shown that even in the worst case of a balanced distribution
    when looking at a variable with two values [e.g. ASPECT:
    progressive/non-progressive --> 50%/50%] the results will be
    significant at the alpha=0.05 level (n = (4*p*(1-p))/alpha^2). I
    wonder if anyone has done some work on this and can comment on the
    number of necessary cases if the variable has got more than two values
    (e.g. SUBJECT: 1PSG, 2PSG, etc.)

    Best, Norbert

    ------------------------
    Norbert Schlüter
    English Language Pedagogy
    Freie Universität Berlin
    nosch@zedat.fu-berlin.de



    This archive was generated by hypermail 2b29 : Sun Jun 03 2001 - 14:27:07 MET DST