[Corpora-List] RE: Legal aspects of compiling corpora

From: R.M.Salkie@bton.ac.uk
Date: Fri Jun 13 2003 - 18:17:39 MET DST

  • Next message: afida mohd: "[Corpora-List] a corpus of journalistic data in British"

    Let's separate the moral question from the legal question.

    The moral question is, are you doing anything wrong if you include text from
    someone else's web page in your corpus?

    My answer: Presumably the relevant general moral principle is that you
    should not deprive the original author of money which is rightfully theirs.
    This must depend on whether you using their text purely for non-commercial
    research, or in order to make money yourself.

    If you are using someone else's words purely for research, that is morally
    right, in my opinion. In fact, you are likely to be increasing their
    income, because you are giving them free publicity by including their words
    in your corpus.

    Now consider the harder case where you use someone else's words in a corpus
    to help you write a textbook which sells millions of copies. Some people
    might argue that the original author is morally entitled to a share of your
    money. A counterargument would be that the original text was written to be
    read, not to be included in a corpus and (for example) searched for frequent
    collocations. The textbook writer has used the original text as data, not
    for its intellectual content, and it is the analysis of the data which gives
    the text its commercial value. Therefore the original author has no moral
    right to any of the money from the textbook.

    (Compare these two cases: (1) a textbook writer enhances her book by citing
    a page from someone's research article which contains supporting arguments.
    (2) a textbook writer uses that same page from the research article to
    illustrate the use of connectives in academic texts. In case (1) the
    original author has a moral claim on some of the money generated by the
    textbook. In case (2) the original author does not have a moral claim, I
    think -- the argument above about free publicity applies, instead. It would
    be interesting to know what other list member think about this).

    The argument that it is the analysis by the corpus scholar which creates the
    commercial value of a text in a corpus can perhaps be taken further.
    Suppose I take a printed book which is currently on sale and making money
    for its author, scan it into electronic form, and use it in my corpus for
    commercial purposes such as textbook writing. This is probably the hardest
    case, since both parties involved have made money out of the same text.
    Perhaps even in this case the original author has no claim to a share of my
    profits. It could be argued, indeed, that the original author should feel
    honoured that I am using their text in this way.

    Using for corpus analysis someone else's data which is in the public domain
    (free or for a price, it makes no difference to the moral question) is no
    different from any other experimental data. You have a moral duty to the
    person who supplied the data, and to your professional colleagues, to
    acknowledge the source of the data, and sometimes you should anonymise the
    data so as not to humiliate the person who supplied it; but I don't think
    you owe them any money that you earn from using the data.

    The legal question is different. I concur with Adam Kilgarriff's earlier
    statement that it depends on how rich your enemies are. On the other hand,
    if you can show that you have taken into account the best current thinking
    about the moral question, that might strengthen your case before a court.

    Any comments?

    Raphael Salkie
    School of Languages
    University of Brighton, England

    -----Original Message-----
    From: delucca@nilc.icmc.usp.br [mailto:delucca@nilc.icmc.usp.br]
    Sent: 13 June 2003 13:49
    To: corpora@hd.uib.no
    Subject: [Corpora-List] Legal aspects of compiling corpora

    Dear Linguists and Lawyers,

    I am troubled with Legal aspects of corpora compiling. I am in
    doubt if is an illegal procedure storage webpages (or part of them)
    in a database (see at http://www.dictionarium.com/project.htm),
    not available to public, and display its contents as short collocations
    less than 100 characters by time by search method.

    On the other hand, the Internet search engines uses cached (temporary ?)
    copies of the sites and display a short of the web pages.

    My procedure is wrong? Which the Legal difference? I need ask permission
    for each website to storage its pages? If I mention the source and the
    author
    I will be protecting the copyrights?
     

    I look forward to hearing from you.

    Yours Sincerely,

    J. L. De Lucca

    -------------------------------------------------
    This mail sent through IMP: http://horde.org/imp/



    This archive was generated by hypermail 2b29 : Fri Jun 13 2003 - 18:17:38 MET DST