Corpora: CHRISTINE Corpus, Stage I

Geoffrey Sampson (geoffs@cogs.susx.ac.uk)
Tue, 3 Aug 1999 12:26:20 +0100

CHRISTINE Corpus, Stage I

Stage I of the CHRISTINE Corpus is now available. It
comprises a structurally-annotated cross-section of
spontaneous 1990s speech drawn from all UK regions,
social classes, etc. The annotation scheme is that of the
well-established SUSANNE Corpus, and is defined in detail in G.R. Sampson, _English for the Computer_, Clarendon Press
(Oxford University Press), 1995.

CHRISTINE/I is described in detail in its Documentation
file, which is available on the Web at
http://www.cogs.susx.ac.uk/users/geoffs/ChrisDoc.html
(250 kb, about 35,000 words). Another Web page,
http://www.cogs.susx.ac.uk/users/geoffs/RChristine.html,
discusses the background and aims of the CHRISTINE Project.
The Corpus can be downloaded by anonymous ftp. The URL is
ftp://ftp.cogs.susx.ac.uk/pub/users/geoffs/CHRISTINE1.tar.Z
-- use "uncompress" to uncompress the file, and then
"tar -xf" to unpack the tar file into its 84 component files
(which include a copy of the Documentation file).

CHRISTINE/I includes about 40% of the eventual complete CHRISTINE Corpus. The complete Corpus is expected to be ready for distribution early in the year 2000.

Prof. Geoffrey Sampson

School of Cognitive & Computing Sciences
University of Sussex
Falmer, Brighton BN1 9QH, GB

e-mail geoffs@cogs.susx.ac.uk
tel. +44 1273 678525
fax +44 1273 671320
Web site http://www.grs.u-net.com