Re: [Corpora-List] Legal aspects of compiling corpora

From: Torzec Nicolas ATER LSI (nicolas.torzec@enssat.fr)
Date: Tue Jun 17 2003 - 10:55:48 MET DST

  • Next message: Lam Yuen Wing, Peter: "Re: [Corpora-List] size of reference corpus - resent"

    Dear Linguists and Lawyers,
    I have got the same "problem" with a large (tagged) monitor corpus of
    texts from french written on-line forums :
    - these messages are publically available in the sense that everybody
    can read and reuse them
    - each newsgroup server stores and uses its own copies of them
    - search engines use and exploit cached copies of them
    - ...

    So,
    - It is an illegal procedure to store these messages - in an anonymous
    way - in a database ?
    - It is an illegal procedure to exploit this corpus for research
    purposes ? (i.e. to realise linguistic studies and to develop NLP
    processing using corpus-based machine learning methods)
    - It is an illegal procedure to illustrate scientific articles with
    examples from this corpus ?

    Do I need to ask permission for each author to store and use its
    messages ? What if I mention the source and the author ? What about the
    copyrights?

    Moreover,
    - What if I want to make my corpus publically available for researchers
    ?
    - What if NLP processing developed from this corpus are to be integrated
    in commercial products ?

    Thank you in advances for your help...
    References, pointers and suggestions are welcome, especially for the
    legal aspects for France...

    Nicolas Torzec

    --
    Nicolas Torzec
    PhD Student in NLP processing
    --
    

    delucca@nilc.icmc.usp.br wrote: > > Dear Linguists and Lawyers, > > I am troubled with Legal aspects of corpora compiling. I am in > doubt if is an illegal procedure storage webpages (or part of them) > in a database (see at http://www.dictionarium.com/project.htm), > not available to public, and display its contents as short collocations > less than 100 characters by time by search method. > > On the other hand, the Internet search engines uses cached (temporary ?) > copies of the sites and display a short of the web pages. > > My procedure is wrong? Which the Legal difference? I need ask permission > for each website to storage its pages? If I mention the source and the author > I will be protecting the copyrights? > > > I look forward to hearing from you. > > Yours Sincerely, > > J. L. De Lucca > > ------------------------------------------------- > This mail sent through IMP: http://horde.org/imp/

    -- Nicolas TORZEC

    ENSSAT / Université de Rennes 1 6, rue de Kerampont 22300 Lannion

    Mel : nicolas.torzec@enssat.fr Tel : 02.96.46.27.30 Fax : 02.96.37.01.99 Web : http://www.enssat.fr --



    This archive was generated by hypermail 2b29 : Tue Jun 17 2003 - 13:08:07 MET DST