Corpora: CFP: Speech Annotation and Corpus Tools

Steven Bird (sb@unagi.cis.upenn.edu)
Thu, 08 Jul 1999 11:03:48 EDT

*** SECOND ANNOUNCEMENT ***

SPEECH COMMUNICATION

CALL FOR PAPERS

Special Issue on SPEECH ANNOTATION AND CORPUS TOOLS
[http://www.ldc.upenn.edu/annotation/specom.html]

Submission Deadline: 30 August 1999

Guest editors: Steven Bird and Jonathan Harrington

Aims and Scope of Speech Communication (from the journal homepage)

Speech Communication is an interdisciplinary journal whose primary
objective is to fulfil the need for the rapid dissemination and
thorough discussion of basic and applied research results. In order to
establish frameworks to inter-relate results from the various areas of
the field, emphasis will be placed on viewpoints and topics of a
transdisciplinary nature. ... The journal's primary objectives are:
to present a forum for the advancement of human and human-machine
speech communication science; to stimulate cross-fertilization between
different fields of this domain; to contribute towards the rapid and
wide diffusion of scientifically sound contributions in this domain.
General information about Speech Communication, the official journal
of the European Speech Communication Association, can be found at
[www.elsevier.com/locate/specom].

Scope of the Special Issue

Submissions are invited for a special issue of Speech Communication on
Speech Annotation and Corpus Tools. The aim of the special issue is
to make speech scientists aware of recent developments in the
representation and management of annotated speech corpora,
i.e. collections of speech signal data with time-aligned
transcriptions. (Signal data may be audio or physiological, natural
or artificial, in basic or derived form.) The primary focus is the
structure of annotations and of annotated corpora, as used within and
across a wide range of disciplines concerned with spoken human
communication.

Annotated speech corpora have been a critical component of research in
the speech sciences for some years. Today, these corpora are being
created and deployed for a rapidly expanding set of languages,
disciplines and technologies. A wealth of formats and tools have
sprung up around this enterprise, a diversity which at once
facilitates and frustrates progress. The linguistic annotation page
[www.ldc.upenn.edu/annotation/] has drawn attention to the scale of
ongoing activity, to the existence of diverse approaches to similar
problems and of similar approaches to diverse problems. Despite the
explicit formats and well-documented user interfaces, insights about
the structure of the annotations themselves are often buried in coding
manuals and internal data structures. There is a pressing need for
papers which document the corpora and tools, which identify notational
and functional equivalences among different approaches, and which
report on new approaches to core representational problems.

The special issue will consider papers which address theoretical and
practical issues concerning the representation of annotations, the
structure of annotated corpora, and the design, analysis and
implementation of tools for creating, browsing, searching,
manipulating and transforming annotations and annotated corpora.
In each case, the description of annotation structures or
tools should be accessible to readers outside the particular community
in which the system originated.

A broad sampling of relevant issues is given below:

+ representational issues:
- sequence, overlap, hierarchy
- simultaneous cross-cutting hierarchies
- the nature of labels
- pointers and cross-references
- temporal structure, instants and periods
- atemporal information (e.g. demographic data)

+ relationships between annotations and signals:
- multiple independent annotations of a single signal
- single annotations which reference multiple signals
- annotations which reference other annotations

+ database issues:
- structuring annotations, signals and atemporal data into a corpus
- indexing for efficient access of large corpora
- high and low level query languages, cross-level query
- validation, update, provenance
- data transformation and integration
- file formats, storage, transfer; the place of XML

+ implementation issues:
- design philosophies and functionalities for annotation toolkits
- approaches to creation, browsing, navigation, display
- reusability, interoperability, platform independence
- integration with independent tools (e.g. statistical analysis)
- techniques for working with multiple corpus formats

+ wider issues:
- methodologies for research and development involving annotated corpora
- the cycle of refining annotations and refining theoretical models
- the role of annotated corpora in evaluating theories and systems
- necessary steps towards general purpose tools and formats

Important Dates

* 400 Word Abstracts: any time in May-July
* Advance Notification: Monday August 16th, 1999
* Submission Deadline: Monday August 30th, 1999
* Acceptance Decision: late October, 1999
* Final Version Due: late January, 2000
* Publication Date: mid 2000

Advance Notifications

1. Prospective authors are encouraged to submit a 400 word abstract of
their paper so that the editors can comment on its suitability for the
special issue. These abstracts should be formatted as ASCII text and
submitted by email to both editors.

2. To facilitate a rapid review process, authors are required to give
notification of their submission two weeks in advance of the
submission deadline. Notification should consist of the title and (a
draft of) the final abstract, formatted as ASCII and emailed to both
editors.

Submissions

All submissions must consist of original unpublished work that is not
being submitted for publication elsewhere. Papers should be
approximately 30 pages double spaced. Electronic submission is
encouraged. Details about preparation of electronic and paper
submissions, and any updates to the CFP, will be posted on the web at
[http://www.ldc.upenn.edu/annotation/specom.html]. Please register at
this site to receive email notification of any subsequent
announcements concerning the special issue.

--
Dr Jonathan Harrington
Director, Speech Hearing and Language Research Centre,
Department of Linguistics, Macquarie University
Sydney, NSW 2109, Australia.
Tel: +61 2 9850-8740  Fax: +61 2 9850-9199
jmh@srsuna.shlrc.mq.edu.au
http://www.ling.mq.edu.au/dbase/person.phtml?oid=19084

Dr Steven Bird Associate Director, Linguistic Data Consortium University of Pennsylvania, 3615 Market St, Suite 200 Philadelphia, PA 19104-2608, USA Tel: +1 215 573-3352 Fax: +1 215 573-2175 sb@ldc.upenn.edu http://www.ldc.upenn.edu/sb