New speech corpus from LDC

ldc@unagi.cis.upenn.edu
Fri, 28 Apr 95 15:59:01 EDT

Announcing

A New Corpus from
the Linguistic Data Consortium

The TRAINS Spoken Dialog Corpus

This CD-ROM contains a corpus of task-oriented spoken
dialogs. These dialogs were collected as part of the TRAINS project,
a project to develop a conversationally proficient planning
assistant, which helps a user construct a plan to achieve some task
involving the manufacturing and shipment of goods in a railroad
freight system. The collection procedure was designed to make the
setting as close as to human-computer interaction as possible, but
was not a "wizard" scenario, where one person pretends to be a
computer. Thus these dialogs provide a snapshot into an ideal human-
computer interface that would be able to engage in fluent
conversations.

Altogether, this corpus includes 98 dialogs, collected using
20 different tasks and 34 different speakers. This amounts to six and
a half hours of speech, about 5900 speaker turns, and 55000
transcribed words.

Information on other LDC databases is available via anonymous ftp, including
a complete catalog, details on corpora, membership and other licensing forms,
and some samples of data. Connect to ftp.cis.upenn.edu, login as anonymous,
give your email address as password, and go to directory pub/ldc.

The LDC's WWW Home Page holds the LDC catalog and all "readme" files from
each of the corpora released. It can be accessed at URL

ftp://ftp.cis.upenn.edu/pub/ldc_www/hpage.html