Re: Corpora: What is a corpus

From: Lou Burnard (lou.burnard@computing-services.oxford.ac.uk)
Date: Fri Jan 28 2000 - 11:38:27 MET

  • Next message: Sabine Bartsch, FB02 SprachLit: "Re: Corpora: What is a corpus"

    This turns out to be quite an interesting discussion, since it really
    hinges on what a "proverb" is. If Francois had said (say) a corpus of
    sermons, or a corpus of advertisements, or a corpus of texts composed
    by 18th century french expatriate seamen with wooden legs, I don't
    think Oliver would have turned a hair (well, maybe in the last
    example) because all of those things are definable as types of text or
    artefact or entity or whatever. But proverbs don't seem to fit in with
    that list of things somehow: where would you look for proverbs? they
    don't typically appear in isolation -- you don't go to the book shop
    and say "What proverbs have been published lately?" -- the newspapers
    don't have lists of today's hot proverbs -- no-one ever says "I think
    I'll create a proverb today" -- all of which makes me think that a
    proverb is not a text, but a judgment about a bit of a text. A
    collection of things-judged-proverbial is an interesting text,
    certainly, but it doesn't seem to be a corpus as we currently think of
    them.

    So while I agree with Lucian (and everyone else) that it's the act of
    filtering which defines a corpus, I feel the need to define the nature
    of the holes in the filter a bit more precisely. In other words, I
    think we need a definition for the *components* of a corpus, which
    would accept (say) a classified advert or a conversation with a travel
    agent but reject a metaphor or a proverb or even (here I feel the
    ground a bit shaky) a sentence containing a past tense verb.
     
    Lou

     ----------------------------------------------------------------
     Lou Burnard http://users.ox.ac.uk/~lou
     ----------------------------------------------------------------



    This archive was generated by hypermail 2b29 : Fri Jan 28 2000 - 11:38:11 MET