Corpora: announcing TalkBank

Brian MacWhinney (macw@cmu.edu)
Fri, 04 Jun 1999 16:23:18 -0400

Project Announcement for
"TalkBank: A Multimodal Database of Communicative Interaction"

The goal of TalkBank is to create a distributed, web-based data
archiving system for transcribed video and audio data on communicative
interactions. TalkBank builds on our experience with CHILDES and LDC
corpora, and is expected to be a major new tool for the social
sciences. TalkBank data will be stored in an XML-based transcription
framework incorporating richly structured, time-aligned annotations.

For detailed information, please consult:
CMU - http://childes.psy.cmu.edu/talkbank.html
Penn - http://www.ldc.upenn.edu/annotation/talkbank.html

We believe that TalkBank will benefit four types of research enterprises:

Cross-corpora comparisons. For those interested in quantitative
analyses of large corpora, TalkBank will provide direct access to
enormous amounts of real-life data, subject to strict controls
designed to protect confidentiality.

Folios. Other researchers wish to focus on qualitative analyses
involving the collection of a carefully sampled folio or casebook of
evidence regarding specific fine-grained interactional
patterns. TalkBank programs will facilitate the construction of
these folios.

Single corpus studies. For those interested in analyzing their own
datasets rather than the larger database, TalkBank will provide a
rich set of open-source tools for transcription, alignment, coding,
and analysis of audio and video data.

Collaborative commentary. For researchers interested in contrasting
theoretical frameworks, TalkBank will provide support for entering
competing systems of annotations and analytic profiles either
locally or over the Internet.

The creation of this distributed database with its related analysis
tools will free researchers from many tedious aspects of data analysis
and will stimulate fundamental improvements in the study of
communicative interactions. The initiative unites ongoing efforts
from the Linguistic Data Consortium (LDC) at Penn, the Penn Database
Group, the Informedia Project at CMU, and the CHILDES Project at
CMU. The initiative also establishes an ongoing interaction between
computer scientists, linguists, psychologists, sociologists, political
scientists, criminologists, educators, ethologists, cinematographers,
psychiatrists, and anthropologists.

A variety of funding possibilities are being sought for TalkBank, and we have
recently received a commitment of support from NSF for initial planning
meetings. We are also using the initiative to foster wide-ranging cooperation
between ongoing research efforts. The TalkBank homepage
[http://www.ldc.upenn.edu/annotation/talkbank.html] lists current
participants and has a pointer to a document giving a detailed exposition of
our vision for TalkBank.

We invite anyone who is interested in participating actively in TalkBank or
even in just providing suggestions and criticism to contact one or more of us:

Brian MacWhinney (Psychology, CMU)
Howard Wactlar (Computer Science, CMU)
Peter Buneman (Computer Science, U Penn)
Mark Liberman (Linguistic Data Consortium, U Penn)
Steven Bird (Linguistic Data Consortium, U Penn)

***********************

This message is being posted on June 4, 1999 to the following mailing lists.
Our apologies if you receive multiple copies.

If you think this announcement should be posted to additional mailing lists,
please send the addresses of those lists to Brian MacWhinney (macw@cmu.edu).
It is particularly important to reach additional lists outside of the domains
of linguistics and psycholinguistics. Many thanks.

corpora@hd.uib.no
elsnet-list@let.ruu.nl
empiricists@unagi.cis.upenn.edu
language-culture@cs.uchicago.edu
linganth@cc.rochester.edu
linguist@listserv.linguistlist.org
ap-mate@mate.mip.ou.dk
nl-kr@cs.rpi.edu
info-childes@childes.psy.cmu.edu
info-psyling@psy.gla.ac.uk
funknet@rice.edu