Corpora: Portuguese corpora: copyright and internet access

Diana Maria de Sousa Marques Pinto dos Santos (Diana.Santos@informatics.sintef.no)
Wed, 24 Nov 1999 14:14:49 +0100

Dear Mark,

The copyright problem is really a vexing issue.
Even though you mention that you are interested in historical texts, most
of them seem to have been published much later than before 70 years ago,
which indicates that the publishers may still hold some copyright.

The problem of making language resources available has been one of the
focus in our project Computational Processing of Portuguese
(http://www.portugues.mct.pt).

In particular, we have been concerned with giving access to Portuguese
corpora through the Internet (see http://cgi.portugues.mct.pt/acesso/ for
the present version -- some documentation is still missing...) and, as far
as copyright is concerned, it looks as if one needs to:

1) ask permission to the owner/author of the texts (author _and_ publisher,
or simply publisher, depending on copyright ownership)
2) ask permission to the compiler of the corpus
3) ask permission to the distributor of the corpus
[And then you still will have, at least, to be clear about whether you are
asking for your own use, or for everyone's use, which would be the case if
you want to give general access through the Internet]

See the case of the MLCC corpus (a corpus of public debates and questions
in the European Parliament, originally published in the Official Journal of
the European Communities), which was compiled by Henry Thompson and his
team, from whom we got permission. Since it is distributed by ELRA (see
http://www.icp.grenet.fr/ELRA/cata/text_det.html#mlcc, resource
ELRA-W0007), and the license under which the CDROM was bought does not
allow further distribution, we are still waiting (more precisely since 18
June 1999) for an answer from ELRA, in order to know whether or not we get
permission to grant access to the Portuguese part through our Internet
service.

This illustrates clearly how many people / institutions / copyright holders
can exist even for an (otherwise) public content.

In any case, we would be glad to try to help you with your Portuguese texts
and eventually also distribute them through our site. This applies to
anyone who may be engaged in the process of compiling corpora of
Portuguese. We are already in contact with many of you, but let me restate
this in case there are other readers of the corpora list who have not heard
about our project.

Diana


**************************************************************************
Diana Santos Computational processing of Portuguese

SINTEF Telecom and Informatics Tel. (direct line) +47 22 06 73 12
Forskningsveien 1 Tel. +47 22 06 73 00
Box 124 Blindern Fax. +47 22 06 73 50
N-0314 Oslo Email: Diana.Santos@informatics.sintef.no
Norway http://www.portugues.mct.pt/
**************************************************************************