TEI Workshop at DL96

Nancy Ide (ide@univ-aix.fr)
Thu, 14 Mar 1996 13:28:36 +0000

*********************
* W O R K S H O P *
*********************

The Text Encoding Initiative Guidelines
and Their Application to Building Digitial Libraries

March 23, 1996
9:30am - 3:30pm

Organizers

Nancy Ide
Vassar College, USA and CNRS, France

Judith Klavans
Columbia University, USA

Held in conjunction with
DIGITAL LIBRARIES'96
First ACM INTERNATIONAL CONFERENCE ON DIGITAL LIBRARIES
March 20-23, 1996
Hyatt Regency
Bethesda, Maryland USA

P R O G R A M

9:30 - 10:15 Overview of the TEI
Nancy Ide and Judith Klavans, TEI Steering Committee

10:15 - 10:45 The TEI in the Perseus Project
David A. Smith, Perseus project, Tufts University

10:45 - 11:15 How will library cataloging relate to TEI documents? Issues
on USMARC and TEI
Steven Davis, Columbia University

11:15 - 11:45 TEI and the American Memory Projectat the Library of Congress
Debbie Lapeyre and Tommie Usdin, Atlis Consulting

11:45 - 12:15 Encoding two large Spanish corpora with the TEI scheme:
Design and technical aspects of textual markup
Marta Pino, Instituto de Lexicografia, Real Academia Espannola

12:15 - 1:00 Lunch

1:00 - 1:30 The Model Editions Partnership: Creating Editions of Historical
Documents for the Internet
David Chesnutt and Michael Sperberg-McQueen, Model Editions
Project

1:30 - 2:00 Creating DTDs via Fred
Keith Shafer, OCLC Online Computer Library Center

2:00 - 2:30 Some Problems of TEI Markup and Early Printed Books
Julia Flanders, Brown University Women Writers Project

2:30 - 3:00 TEI and the National Digital Library Program
LeeEllen Friedland, National Digital Library Program,
Library of
Congress

3:00 - 3:30 Suggestions for the future development of the TEI Guidelines

--------------------------------------------------------------------
| This announcement with links to papers, etc. is available on the |
| World Wide Web at <http://www.cs.vassar.edu/~ide/DL96/> |
--------------------------------------------------------------------

D E S C R I P T I O N

The Text Encoding Initiative's Guidelines for Electronic Text Encoding and
Interchange of Machine-Readable Texts were published in May 1994, after six
years of development within the academic and research communities. The
SGML-based Guidelines provide standardized encoding conventions for a large
range of text types and features relevant for a broad range of
applications, including natural language processing, information retrieval,
hypertext, electronic publishing, various forms of literary and historical
analysis, lexicography, etc. The Guidelines are intended to apply to texts,
written or spoken, in any natural language, of any date, in any genre or
text type, without restriction on form or content. They treat both
continuous materials (running text) and discontinuous materials such as
dictionaries and linguistic corpora. As such, the TEI Guidelines offer the
best encoding solution currently available for the development of digital
libraries, where varied and complex texts must be stored and manipulated in
ways that answer a wide variety of user needs, and where the linkage of
multi-media is essential.

The TEI provides encoding conventions for describing the physical and
logical structure of many classes of texts, as well as features particular
to a given text type or not conventionally represented in typography. The
TEI Guidelines also cover common text encoding problems, including intra-
and inter-textual cross reference, demarcation of arbitrary text segments,
alignment of parallel elements, overlapping hierarchies, etc. In addition,
they provide conventions for linking texts to acoustic and visual data. The
TEI's specific achievements include:

o the specification of restrictions on and recommendations for SGML use
that enables maximal generality and flexibility in order to serve the
widest possible range of research, development, and application needs;

o analysis and identification of categories and features for encoding
textual data, at many levels of detail;

o specification of a set of general text structure definitions that is
effective, flexible, and extensible;

o specification of a method for in-file documentation of electronic texts
compatible with library cataloging conventions, which can be used to
trace the history of the texts and thus assist in authenticating their
provenance and the modifications they have undergone--this is especially
valuable for the development of digital libraries;

o specification of encoding conventions for special kinds of texts or text
features, including: character sets, language corpora, general
linguistics, dictionaries, terminological data, spoken texts,
hypermedia, literary prose, verse, drama, historical source materials,
and text critical apparatus.

The Guidelines also provide an extensible and flexible Document Type
Definition (DTD) framework for text encoding, containing a common core of
features, a choice of frameworks or bases, and a wide variety of optional
additions for specific applications or text types. In addition, the TEI
Guidelines offer the possibility to encode many different views of a text,
simultaneously if necessary, which is of critical interest for building
digital libraries, where different users may view the same text in many
different ways (physical object, logical structure, rhetorical object,
linguistic object, etc.).

Theme and Goals of the Workshop
-------------------------------

Extensive application of the Guidelines began in a large-scale way since
their release in spring of 1994. Numerous projects in North America and
Europe have recently adopted the Guidelines for a wide variety of
applications. The work of the TEI is now to evaluate, modify and extend the
Guidelines in response to user experience and needs.

This workshop provides a forum for technical discussion and evaluation of
the TEI Guidelines, as they have so far been implemented in real
applications, particularly those which have relevance for building digital
libraries. The topics include but are not limited to:

o detailed description of application of the Guidelines, with
particular emphasis on interesting problems and (TEI or non-TEI)
solutions

o handling unusual or complex text types, or text types not treated in
the Guidelines

o handling multi-media with the Guidelines

o evaluation of the TEI DTD architecture, element and entity classes,
etc.

o encoding multiple views or information types

o proposals for extension of the TEI Guidelines

o data architectures (e.g., multiple linked files, etc.) for storing
complex documents

A second focus of the workshop is the refinement and/or adaptation of the
TEI Guidelines for particular text types and/or applications. Because it
aims at maximal generality, the TEI necessarily takes its encoding
solutions to the highest possible level of abstraction. In addition, the
TEI often provides multiple options for encoding the same phenomenon. The
need to provide mechanisms which are maximally general and flexible is at
times at odds with the provision of mechanisms which are most efficient
and/or effective for a specific application or intended use. To develop an
encoding standard specifically suited to a given application, it is
desirable to choose from among various encoding options the method that is
optimal in the light of intended use. It may also be advantageous to refine
or delimit TEI solutions which are over-general for the needs of a given
application.

In sum, the overall goals of the workshop are (1) to generate a technical
discussion on the applicability of the TEI Guidelines for building digital
libraries, and (2) to provide a forum for a broad assessment of encoding
needs for building digital libraries, in order to obtain a clearer idea of
what these needs are, and, if applicable, the directions in which the
development of the TEI Guidelines and surrounding activities should go to
accomodate them.