RE: [Corpora-List] Legal aspects of compiling corpora

From: Mark Davies (mdavies@ilstu.edu)
Date: Tue Jun 17 2003 - 17:54:50 MET DST

  • Next message: Martin Wynne: "RE: [Corpora-List] Legal aspects of compiling corpora"

    When I was compiling the 100 million word Corpus del Espaņol (www.corpusdelespanol.org), I
    consulted two professors from the US who are experts on copyright law, as applied to the
    Internet. I explained to them that in my corpus, at least, end users wouldn't have access
    to etnire paragraphs of text, much less an entire text itself. Both were in agreement
    that it would be quite unlikely that there would be any copyright problems.

    What has me intrigued with search engines like Google, however, is their "cached web page"
    functionality, in which they are in essnce reproducing an entire web page -- and all of
    the web pages of a given site (assuming no use of robots.txt). It seems that this is much
    more than the limited context that I ( and others) make available in our corpora, and yet
    there has been no legal challenge.

    On the other hand, both of the professors who I consulted mentioned that it's still a very
    murky issue with little or no clearly defined legal precedent -- at least in the US.

    Mark Davies

    =================================================
    Mark Davies
    Assoc. Prof., Spanish Linguistics
    Illinois State University
    http://mdavies.for.ilstu.edu/

    ** Corpus design and use // Web-database scripting **
    ** Historical and dialectal Spanish and Portuguese syntax **
    =================================================



    This archive was generated by hypermail 2b29 : Tue Jun 17 2003 - 15:54:18 MET DST