Re: Corpora: What is a corpus

From: Ute Röme (ute.roemer@uni-koeln.de)
Date: Fri Jan 28 2000 - 12:54:01 MET

  • Next message: Andrew Harley: "Re: Corpora: Corpus of English proverbs and set phrases"

    I would like to add something to the problem of proverbs and corpus
    composition. I think a proverb cannot be regarded as a "component" of a
    corpus (although proverbs are of course included in corpora) because a
    proverb is not a "kind of text" like newspaper articles, novels, or
    telephone conversations are "text-types" (sorry, I can't find a more
    adequate expression for this).

    Ute Römer

    -----Ursprüngliche Nachricht-----
    Von: Lou Burnard <lou.burnard@computing-services.oxford.ac.uk>
    An: Lucian Galescu <galescu@cs.rochester.edu>
    Cc: CORPORA@hd.uib.no <CORPORA@hd.uib.no>
    Datum: Freitag, 28. Januar 2000 11:50
    Betreff: Re: Corpora: What is a corpus

    >This turns out to be quite an interesting discussion, since it really
    >hinges on what a "proverb" is. If Francois had said (say) a corpus of
    >sermons, or a corpus of advertisements, or a corpus of texts composed
    >by 18th century french expatriate seamen with wooden legs, I don't
    >think Oliver would have turned a hair (well, maybe in the last
    >example) because all of those things are definable as types of text or
    >artefact or entity or whatever. But proverbs don't seem to fit in with
    >that list of things somehow: where would you look for proverbs? they
    >don't typically appear in isolation -- you don't go to the book shop
    >and say "What proverbs have been published lately?" -- the newspapers
    >don't have lists of today's hot proverbs -- no-one ever says "I think
    >I'll create a proverb today" -- all of which makes me think that a
    >proverb is not a text, but a judgment about a bit of a text. A
    >collection of things-judged-proverbial is an interesting text,
    >certainly, but it doesn't seem to be a corpus as we currently think of
    >them.
    >
    >So while I agree with Lucian (and everyone else) that it's the act of
    >filtering which defines a corpus, I feel the need to define the nature
    >of the holes in the filter a bit more precisely. In other words, I
    >think we need a definition for the *components* of a corpus, which
    >would accept (say) a classified advert or a conversation with a travel
    >agent but reject a metaphor or a proverb or even (here I feel the
    >ground a bit shaky) a sentence containing a past tense verb.
    >
    >Lou
    >
    > ----------------------------------------------------------------
    > Lou Burnard http://users.ox.ac.uk/~lou
    > ----------------------------------------------------------------
    >
    >



    This archive was generated by hypermail 2b29 : Fri Jan 28 2000 - 13:37:26 MET