RE: [Corpora-List] Legal aspects of compiling corpora

From: Sampo Nevalainen (samponev@cc.joensuu.fi)
Date: Thu Jun 19 2003 - 10:26:31 MET DST

  • Next message: Doug Cooper: "Re: [Corpora-List] Legal aspects of compiling corpora"

    Hi,

    >then we will face another problem of comparing approaches and techniques,
    >if each of us use different corpora (without any possibility to share it
    >with others because of the legal aspects) then no comparison will be possible.

    My comment is clearly out of topic, but I could not resist... This is one
    thing I have not fully understood ever since I was irrevocably taken with
    CL. Many text books on CL give an idea that a corpus should have a finite
    size and be "a standard reference" (as McEnery and Wilson put it in "Corpus
    Linguistics" 1996). In my humble opinion, this is rather unnatural, as,
    after all, we are studying an open, ever-growing, dynamic, lively organism
    (unless we are interested in "dead" languages). From this viewpoint, if we
    are going to generalize anything about a language, at least I would have
    more confidence in results that are based on several different corpora
    rather than on a detailed description of a certain corpus. Just as weather
    forecasts or climate studies -- the more measurement points are available
    the more reliable they are. (Clearly, one practical solution is a kind of
    "monitor corpus" -- or the Internet. I understand that the cruciality of
    this question depends a lot on the purpose(s) of the corpus and the aim(s)
    of the researcher, which, I think, should be convergent to some extent.) Of
    course, the other side of the coin is economy. It would be a huge waste of
    money and resources if everybody should compile corpora of their own - and
    preferably non-stop!

    sincerely
    Sampo

            



    This archive was generated by hypermail 2b29 : Thu Jun 19 2003 - 10:31:43 MET DST