Corpora: copyright and internet access

Jeff ALLEN (jeff@elda.fr)
Fri, 17 Dec 1999 14:47:04 +0100

Dear Corpora readers,

The subject of copyright issues for language resources, data,
corpora, etc, has surfaced on the Corpora list again over these
last few weeks. This is a debate that has been brought up
several times on this list over the past couple of years. Rather
than reiterating what has already been stated in the past, it is
simply important to note that since the copyright issue is a
very complicated one, the European Language Resources
Association (ELRA) and its Distribution Agency (ELDA) have
been set up by the European Commission to deal with such
issues. ELRA/ELDA offers assistance to members regarding
legal matters concerning the availability, distribution, and use
of language resources.

With regard to Language Resources that are distributed
through ELDA:

1) all of the legal issues have already been settled with the
providers/owners of the resources and data;
2) the data is available for Language Engineering and Human
Language Technology development. No re-distribution of the
data is allowed. This reflects a commitment made to the
providers/owners that have furnished the data for distribution.
This point is clearly indicated in all Language Resource user
agreements written up by ELRA/ELDA.

Concerning the MLCC language resource (see
http://www.icp.grenet.fr/ELRA/cata/text_det.html#mlcc,
resource ELRA-W0007):

As stipulated in the distribution and end-user contracts, there
is no possible way to allow accessibility to this database via
the Internet. This is clearly indicated in articles 4 and 6, cited
below, of the end-user contract for this language resource.

article 4. END-USER is not permitted to reproduce the
Language Resources for commercial or distribution purposes
and to commercialise (or distribute for free) in any form or by
any means the Language Resources or any derivative product
or services based on all or a substantial part of it.

article 6. Without prejudice to the other provisions, the rights
referred to herein shall be non transferable to any other entity.
The Language Resources shall not be transferred to or
accessed from any other site.

Hoping that this explanation clarifies the questions asked.

Khalid Choukri
ELRA CEO

At 14:14 24/11/99 +0100, Diana Maria de Sousa Marques Pinto dos Santos wrote:
>Dear Mark,
>
>The copyright problem is really a vexing issue.
>Even though you mention that you are interested in historical texts, most
>of them seem to have been published much later than before 70 years ago,
>which indicates that the publishers may still hold some copyright.
>
>The problem of making language resources available has been one of the
>focus in our project Computational Processing of Portuguese
>(http://www.portugues.mct.pt).
>
>In particular, we have been concerned with giving access to Portuguese
>corpora through the Internet (see http://cgi.portugues.mct.pt/acesso/ for
>the present version -- some documentation is still missing...) and, as far
>as copyright is concerned, it looks as if one needs to:
>
>1) ask permission to the owner/author of the texts (author _and_ publisher,
>or simply publisher, depending on copyright ownership)
>2) ask permission to the compiler of the corpus
>3) ask permission to the distributor of the corpus
>[And then you still will have, at least, to be clear about whether you are
>asking for your own use, or for everyone's use, which would be the case if
>you want to give general access through the Internet]
>
>See the case of the MLCC corpus (a corpus of public debates and questions
>in the European Parliament, originally published in the Official Journal of
>the European Communities), which was compiled by Henry Thompson and his
>team, from whom we got permission. Since it is distributed by ELRA (see
>http://www.icp.grenet.fr/ELRA/cata/text_det.html#mlcc, resource
>ELRA-W0007), and the license under which the CDROM was bought does not
>allow further distribution, we are still waiting (more precisely since 18
>June 1999) for an answer from ELRA, in order to know whether or not we get
>permission to grant access to the Portuguese part through our Internet
>service.
>
>This illustrates clearly how many people / institutions / copyright holders
>can exist even for an (otherwise) public content.
>
>In any case, we would be glad to try to help you with your Portuguese texts
>and eventually also distribute them through our site. This applies to
>anyone who may be engaged in the process of compiling corpora of
>Portuguese. We are already in contact with many of you, but let me restate
>this in case there are other readers of the corpora list who have not heard
>about our project.
>
>Diana

=================================================
Jeff ALLEN - Technical Director
ELRA / ELDA
55-57, rue Brillat-Savarin
75013 Paris FRANCE
Tel: (+33) 1.43.13.33.33
Fax: (+33) 1.43.13.33.30
mailto:jeff@elda.fr
http://www.elda.fr/