Re: Corpora: overuse and underuse of learner English

From: Patrick Gillard (pgillard@cambridge.org)
Date: Wed Dec 12 2001 - 12:06:34 MET

  • Next message: Alex Chengyu Fang: "Re: Corpora: International English"

    At 01:44 PM 12/11/01 -1000, Robert Bley-Vroman wrote:
    >At 8:28 AM -1000 12/11/01, xiaotian guo wrote:
    >
    >>It is unavoidable to touch overuse and
    >>underuse in the study of corpora comparison. But to what extend does the
    >>difference of a certain figure reach when we can say overuse or underuse
    >>occurs (I am poor in statistics)?
    >
    >The obvious simple thing is to develop some measurement of rate-of-use.
    >Normally, this would be a proportion (e.g. 20% of the verbs are present
    >tense in native-speaker corpus whereas 40% a present tense in learner
    >corpora). A simple statistic you could calculate would be a confidence
    >interval for the proportion (easy to do by hand even for someone who is
    >poor in statistics). Report the proportion and the confidence interval. If
    >the confidence intervals for the two proportions overlap, it wouldn't be
    >wise to claim overuse or underuse.

    Can I add a further caution. If one attempts to draw conclusions from
    comparisons of Native Speaker corpus and Non-native speaker corpus, it is
    important to make sure that you are comparing like with like.

    When learners are given writing tasks they are sometimes asked to produce
    types of texts that don't occur very frequently in Native-speaker English.
    For example, if a student is asked to describe their daily routine they
    will produce a lot of present simple structures but in native speaker
    English writing you are not very likely to find a text like that. In fact,
    if you use a native speaker corpus that has a large amount of newspaper
    data in it, you may find that the simple past is *over-represented* in your
    corpus compared to native speaker English of other types, because what
    newspapers are mostly concerned with is what happened *yesterday*.

    By the way, I do think that rate-of-use studies are very useful in order to
    analyse learner English. You just have to be careful that you go into it
    with your eyes open.

    Patrick Gillard
    Senior Commissioning Editor
    ELT Dictionaries
    Cambridge University Press

    pgillard@cambridge.org

    http://www.cambridge.org/elt

    Direct line: +44 (0)1223 325596

    Cambridge Learner's Dictionary (published February 2001)
    http://www.cambridge.org/elt/cld



    This archive was generated by hypermail 2b29 : Wed Dec 12 2001 - 23:52:39 MET