Re: [Corpora-List] Legal aspects of compiling corpora

From: William Mann (bill_mann@sil.org)
Date: Fri Jun 13 2003 - 21:47:20 MET DST

  • Next message: Mark Sanderson: "Re: [Corpora-List] Legal aspects of compiling corpora"

    Without making the problem more difficult, I want to point out that very
    similar problems arise in discourse linguistics, where the objects of study
    are connected texts, often necessarily whole texts.

    If a researcher wants to make claims about a whole text, for example about
    how coherence arises, it is often necessary to exhibit the whole text so
    that such claims are examinable. And just as for Corpus Linguistics, the
    texts cannot be made examinable like sentences in a grammar paper, because
    bulk prohibits such large citations.

    There has been a lot of implicit reliance on "fair use," accompanied by
    circulation on the internet. It would be hard for discourse linguistics to
    achieve open discussion of results and evidence without something like this.
    ==================

    There is another locus of examination which might turn out to be very
    relevant. I know about it, but not the details. The Oxford Text Archive
    promotes the protection and circulation of extensive works. They put a lot
    of effort into these issues, including copyright legalities, not
    diminishing the rights of a contributor of a piece, and not creating
    unjustified claims of rights for the Archive itself.

    The result is a multipage License agreement that potential submitters agree
    to.

    They are at http://ota.ahds.ac.uk/ .

    I agree with Doug Cooper that we ought to take a stance. But who is "we"?

    Perhaps one of the new departments of corpus science could take leadership
    on this. It would give it an air of professionalism.

    Bill Mann

    ----- Original Message -----
    From: "Doug Cooper" <doug@th.net>
    To: <corpora@hd.uib.no>
    Sent: Friday, June 13, 2003 2:22 PM
    Subject: Re: [Corpora-List] Legal aspects of compiling corpora

    | At 14:40 13/6/03 +0100, Mark Sanderson wrote:
    | > I think the honest answer is that it is a question with no clear
    answer.
    |
    | Not so clear. The original query was whether a 100-
    | character citation of a text would be a copyright violation.
    | Is there a copyright law anywhere that does not grant
    | "fair use" rights to this sort of minimal citation in all but
    | pathological cases (eg. extremely short texts like song
    | lyrics, or perhaps many consecutive citatations of a
    | single text)?
    |
    | In any case, this question comes up periodically, and the
    | response is almost invariably something along the lines of
    | 'well, you'll probably get away with it.'
    |
    | I am rather surprised that the corpus-using community has
    | not come out with a position statement -- not everybody has
    | to sign on to it, of course -- that articulates the point of view
    | that:
    |
    | a) distributing minimal citations of copyrighted texts, and
    | b) allowing public, indirect access to privately held collections
    | of copyrighted texts for statistical purposes
    | are:
    | a) a necessary part of corpus linguistics research, and
    | b) believed by CL practitioners to be inherently protected
    | as fair use, particularly in non-profit research contexts.
    |
    | and perhaps also gives a few examples of what might _not_
    | be considered professional conduct; eg. making full texts
    | available or easily reconstructed.
    |
    | It seems to me that such a statement would be useful in:
    |
    | a) helping to clarify that CL applications promote the
    | 'Progress of Science;' ie. are a genuine research use;
    | b) helping individual researchers show that they are
    | acting in good faith. in accordance with others in the
    | profession.
    |
    | Obviously, a bunch of us getting together and saying that
    | black is white won't make it so. But to the extent that there
    | _is_ a possible gray area in the balance between copyright
    | and fair use, I think it is important to start to establish our side's
    | position as well.
    |
    | Doug Cooper
    |



    This archive was generated by hypermail 2b29 : Fri Jun 13 2003 - 21:48:43 MET DST