Re: [Corpora-List] Legal aspects of compiling corpora

From: Mark Sanderson (m.sanderson@shef.ac.uk)
Date: Fri Jun 13 2003 - 22:55:58 MET DST

  • Next message: Jason Eisner: "Re: [Corpora-List] Legal aspects of compiling corpora"

    Someone has just kindly pointed out that I was wrong to say that no single
    organisation holds terabytes of data (apart from search engines).
    Organisations like Lexis Nexis have such quantities.

    At 15:47 13/06/03 -0400, William Mann wrote:
    >Without making the problem more difficult, I want to point out that very
    >similar problems arise in discourse linguistics, where the objects of study
    >are connected texts, often necessarily whole texts.
    >
    >If a researcher wants to make claims about a whole text, for example about
    >how coherence arises, it is often necessary to exhibit the whole text so
    >that such claims are examinable. And just as for Corpus Linguistics, the
    >texts cannot be made examinable like sentences in a grammar paper, because
    >bulk prohibits such large citations.
    >
    >There has been a lot of implicit reliance on "fair use," accompanied by
    >circulation on the internet. It would be hard for discourse linguistics to
    >achieve open discussion of results and evidence without something like this.
    >==================
    >
    >There is another locus of examination which might turn out to be very
    >relevant. I know about it, but not the details. The Oxford Text Archive
    >promotes the protection and circulation of extensive works. They put a lot
    >of effort into these issues, including copyright legalities, not
    >diminishing the rights of a contributor of a piece, and not creating
    >unjustified claims of rights for the Archive itself.
    >
    >The result is a multipage License agreement that potential submitters agree
    >to.
    >
    >They are at http://ota.ahds.ac.uk/ .
    >
    >I agree with Doug Cooper that we ought to take a stance. But who is "we"?
    >
    >Perhaps one of the new departments of corpus science could take leadership
    >on this. It would give it an air of professionalism.
    >
    >Bill Mann
    >
    >----- Original Message -----
    >From: "Doug Cooper" <doug@th.net>
    >To: <corpora@hd.uib.no>
    >Sent: Friday, June 13, 2003 2:22 PM
    >Subject: Re: [Corpora-List] Legal aspects of compiling corpora
    >
    >
    >| At 14:40 13/6/03 +0100, Mark Sanderson wrote:
    >| > I think the honest answer is that it is a question with no clear
    >answer.
    >|
    >| Not so clear. The original query was whether a 100-
    >| character citation of a text would be a copyright violation.
    >| Is there a copyright law anywhere that does not grant
    >| "fair use" rights to this sort of minimal citation in all but
    >| pathological cases (eg. extremely short texts like song
    >| lyrics, or perhaps many consecutive citatations of a
    >| single text)?
    >|
    >| In any case, this question comes up periodically, and the
    >| response is almost invariably something along the lines of
    >| 'well, you'll probably get away with it.'
    >|
    >| I am rather surprised that the corpus-using community has
    >| not come out with a position statement -- not everybody has
    >| to sign on to it, of course -- that articulates the point of view
    >| that:
    >|
    >| a) distributing minimal citations of copyrighted texts, and
    >| b) allowing public, indirect access to privately held collections
    >| of copyrighted texts for statistical purposes
    >| are:
    >| a) a necessary part of corpus linguistics research, and
    >| b) believed by CL practitioners to be inherently protected
    >| as fair use, particularly in non-profit research contexts.
    >|
    >| and perhaps also gives a few examples of what might _not_
    >| be considered professional conduct; eg. making full texts
    >| available or easily reconstructed.
    >|
    >| It seems to me that such a statement would be useful in:
    >|
    >| a) helping to clarify that CL applications promote the
    >| 'Progress of Science;' ie. are a genuine research use;
    >| b) helping individual researchers show that they are
    >| acting in good faith. in accordance with others in the
    >| profession.
    >|
    >| Obviously, a bunch of us getting together and saying that
    >| black is white won't make it so. But to the extent that there
    >| _is_ a possible gray area in the balance between copyright
    >| and fair use, I think it is important to start to establish our side's
    >| position as well.
    >|
    >| Doug Cooper
    >|



    This archive was generated by hypermail 2b29 : Sun Jun 15 2003 - 13:25:05 MET DST