Re: Corpora: What is a corpus

From: Oliver Mason (oliver@clg.bham.ac.uk)
Date: Fri Jan 28 2000 - 13:14:18 MET

  • Next message: Ute Röme: "Re: Corpora: What is a corpus"

    On Fri, Jan 28, 2000 at 12:06:09PM +0100, Sabine Bartsch, FB02 SprachLit wrote:

    > Would Oliver agree that 'filtered' in his definition of a
    > corpus is (near)synonymous with 'analysed'?

    Probably. The main point I wanted to make was that I understand a
    corpus to be a lump of real language, not extracts of the same. So you
    could have a corpus of almost anything that is a text type or genre,
    but it wouldn't be a corpus any more once you meddle with it, by eg
    extracting all proverbs, noun phrases or whatnot. The result of that
    would be a list of all proverbs or noun phrases occurring in a
    particular corpus.

    By what I rather unprecisely called `filtering' I meant this extraction
    of elements from a corpus, not the creation of a corpus from the
    infinite amount of language data by selecting a sample of it. But how
    do you collect a `corpus of past tense sentences'? Where do you find
    them in the real world? You can find the plays of Shakespeare as an
    entity which, of course, is a sample of language, and could conceivably
    be treated as a corpus of early modern English drama.

    Oliver

    -- 
    //\\ computer officer | corpus research | department of english | school of  -
    //\\ humanities | university of birmingham | edgbaston | birmingham b15 2tt  -
    \\// united kingdom | phone +44-(0)121-414-6206 | fax +44-(0)121-414-5668/\  -
    \\// mobile 07050 104504 | http://www.clg.bham.ac.uk | o.mason@bham.ac.uk\/  -
    



    This archive was generated by hypermail 2b29 : Fri Jan 28 2000 - 13:13:06 MET