Re: Message Router delivery notification message

Alice Carlberger (alice@speech.kth.se)
22 May 1995 12:54:11 +0200

Received: from OUVAXA.CATS.OHIOU.EDU by spider.speech.kth.se with SMTP
(1.37.109.16/16.2) id AA124977345; Fri, 19 May 1995 17:29:05 +0200
Return-Path: <MRGATE%OUVAX.dnet@ouvaxa.cats.ohiou.edu>
Received: by ouvaxa.cats.ohiou.edu (MX V4.1 VAX) id 22; Fri, 19 May 1995
11:33:40 EDT
Date: Fri, 19 May 1995 11:33:39 EDT
From: MRGATE%OUVAX.dnet@ouvaxa.cats.ohiou.edu
To: alice@speech.kth.se
X-Vmsmail-To: MX%"alice@speech.kth.se"
Message-Id: <00990964.D9D317C0.22@ouvaxa.cats.ohiou.edu>
Subject: Message Router delivery notification message

RE Message ID: G2020D65319MAY199511304403
Generated by node: OUVAX.ALL-IN-1

UA content ID: Sp,It,Fr corpora

Attempted delivery to:

Route : @A1 <--
Userid : BATES
Arrival date : 19-MAY-1995 11:31

This delivery failed. Failure reason was "unable to transfer".
Diagnostic was "unrecognised recipient name".

Message-id: G2020D65319MAY199511304403
From: NAME: <MX%"alice@speech.kth.se"@MRGATE@OUVAX>
Subject: Sp,It,Fr corpora
Date: 19-May-1995
Posted-date: 19-May-1995
Precedence: 1
To: BATES@A1

Return-Path: <owner-corpora@lists.uib.no>
To: corpora@lists.uib.no
Subject: Sp,It,Fr corpora
Date: Fri, 19 May 1995 12:53:38 +0200
From: Alice Carlberger <alice@speech.kth.se>
Sender: owner-corpora@lists.uib.no

A few weeks ago, I sent out two queries on the Corpora list: one for (free)
Spanish, Italian, and French text corpora; the other for (free) Norwegian
text corpora. Here is what I found:

I. NORWEGIAN

1. The European Corpus Initiative Multilingual Corpus CD can be purchased
cheaply from Edinburgh, eucorp@cogsci.ed.ac.uk. It contains several Norwegian
novels from the 1950s and 1970s as well as a few Ibsen plays.

2. Humanistisk datasenter at the University of Bergen has systematically
collected Norwegian materials but no real corpora yet. The texts, which can
be purchased on certain conditions, are the following:

a. 60 Norwegian novels (30 in Bokmaal, 30 in Nynorsk, from 1937, 1957,
and 1977).

b. Historical newspaper texts from 4 Norwegian newspapers
(60,000 - 100,000 words (each?); from the years 1900, 1925, and 1950).

c. Newspaper text from 1980-1983: 900,000 Bokmaal words and 500,000
Nynorsk words. 50% is local newspapers.

d. Newspaper text from 1994 and 1995 from Bergens Tidende (both
Bokmaal and Nynorsk).

II. SPANISH, ITALIAN, FRENCH

1. /pub/corpus/: a. Oral corpus of Spanish (7 MB, about 2,000,000 words)
b. Some written corpora of South American Spanish

2. The lds is the best source, but joining costs money.

3. The Oxford Text Archive
13 Banbury Road
Oxford OX2 6NN
fax: +44 865 273275

Catalogue of over 1300 titles, available in paper
or electronic form on the Oxford VAX Cluster as OX$DOC:TEXTARCHIVE.LIST and
OX$TEXTARCHIVE.SGML, from various ListServers, e.g., LISTSERV@BROWNVM (send
the mail message GET HUMANIST FILELIST for details), by anonymous FTP from
Internet site ota.ox.ac.uk (163.1.2.4) in the directory pub/ota/public.
Also, wherever you are, you can send a note to ARCHIVE@VAX.OXFORD.AC.UK
specifying which form you want.

Italian

a. Corpus of Italian newspapers.
b. Literary works (Dante Alighieri, etc), dialectal texts (?), short stories.

Spanish

a. Literary works, poems.

4. 1066108 words (approx.)
Origin: Grupo EUROTRA, Universidad Autonoma de Madrid
Contact: Manuel Campos, eurotrac@ccuam3.sdi.uam.es or
Fernando Sanchez Leon, Laboratorio de L
Available: Publically via anonymous ftp, node lola.lllf.uam.es,
directory pub/corpus
Contents: transcriptions of spoken language (conferences, conversations, etc.)

5. 121051 words (approx.)
Origin: CHILDES (Child Language Data Exchange System) database, Carnegie Mellon
Univ.
Contact: Brian MacWhinney, brian@andrew.cmu.edu
Available: Publically, previous communication with Brian MacWhinney
Contents: Database of corpora of parent-child and child-child interactions
from children speaking.

6. 9,000,000 words (approx.)
Origin: THis is the European Corpus Initiative Multilingual Corpus I CD-ROM
Cost: 20 Pounds
Contact: eucorp@cogsci.ed.ac.uk
Available: All use of this corpus is subject to a licence agreement
The CD-ROM is available in the US from the Linguistic Data Consortium (LDC),
for members of the LDC or those making a bulk purchase, and otherwise from
ELSNET, 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND. The cost from ELSNET
is 20 UK Pounds plus postage, handling and tax where applicable. Ordering
procedure is detailed in

http://www.cogsci.ed.ac.uk/elsnet/eci.html

7. University of Barcelona: spoken corpus

8. University of Pisa: newspaper corpus

--------------------------------------------------------------------------------
Received: from lists.uib.no (alfred.uib.no) by OUVAXA.CATS.OHIOU.EDU (MX V4.1
VAX) with SMTP; Fri, 19 May 1995 11:30:20 EDT
Received: (daemon@localhost) by lists.uib.no (8.6.10/8.6.10) id MAA11102 for
corpora-ut; Fri, 19 May 1995 12:54:32 +0200
Received: from alfie.uib.no (alfie.uib.no [129.177.30.13]) by lists.uib.no
(8.6.10/8.6.10) with SMTP id MAA11097 for <corpora@lists.uib.no>;
Fri, 19 May 1995 12:54:28 +0200
Received: from spider.speech.kth.se by alfie.uib.no with SMTP (PP) id
<06426-0@alfie.uib.no>; Fri, 19 May 1995 12:54:24 +0200
Received: from cepstrum.speech.kth.se by spider.speech.kth.se with ESMTP
(1.37.109.16/16.2) id AA113030820; Fri, 19 May 1995 12:53:40 +0200
Message-ID: <199505191053.AA113030820@spider.speech.kth.se>
X-Mailer: exmh version 1.6 4/21/95
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Precedence: bulk