Corpora: Corpus Linguistics

From: ramesh@clg.bham.ac.uk
Date: Mon Apr 30 2001 - 00:18:16 MET DST

  • Next message: Ahti Pietarinen: "Corpora: CALL FOR PARTICIPATION (early registration deadline: 30 April): European Summer School in Logic, Language and Information (ESSLLI'01), August 13-24, 2001 (Helsinki, Finland)"

    James L. Fidelholtz wrote:
    =09Hmmm. Maybe I'm not cut out to be a 'real' corpus linguist, if
    this is true, since my principal interest is in relatively 'rare'
    phenomena.

    Ramesh writes:
    As a large corpus would seem to be the best empirical evidence
    we have at our disposal, only a `real' corpus linguist would be
    able to tell you what is a `rare' phenomenon and what isn't....
    The reason for focussing on non-rare phenomena is that one
    can be more certain that we are looking at language features
    that obtain throughout many varied idolects, text-types, modes,
    genres, contexts, etc
    The problem with rare phenomena is that one cannot be certain that
    one of those factors (e.g. idiolect, typographic error, highly
    constrained context) is not the sole explanation for it, and
    therefore it is less generalizable, and must be consigned to
    the general rag-bag category at the bottom of every frequency list,
    of items on which one has to suspend judgement until more data
    confirms it to be a one-off, or shows it to have been the tip of
    the iceberg of a hitherto unnoticed phenomenon. It may also
    be the harbinger of language change, as a synchronic corpus
    becomes a diachronic one, as data is collected over a longer period of
    time.

    Best
    Ramesh Krishnamurthy
    Birmingham



    This archive was generated by hypermail 2b29 : Mon Apr 30 2001 - 00:11:47 MET DST