Text encoding course at SIGIR'95

ERASMUS@v9001.ntu.ac.sg
Wed, 10 May 1995 12:13:50 +0800

________________________________________________________

REUSABILITY, INTERCHANGEABILITY, AND COMPATIBILITY:
ANSWERING THE QUESTIONS OF TEXT ENCODING STANDARDS

Lou Burnard, Oxford University
Judith Klavans, Columbia University
C. M. Sperberg-McQueen, University of Illinois at Chicago

A PRE-CONFERENCE COURSE
to be held in association with
SIGIR '95:
18th International Conference on Research and Development
in Information Retrieval
Seattle, WA, USA
Saturday, July 8, 1995
8:30 a.m. - 3:30 p.m.
________________________________________________________

SIGIR '95, an international research conference on
information retrieval theory, systems, practice and
applications, will be held in Seattle, WA, from July 9-13. On
the Saturday prior to the conference, a one-day course will be
offered covering the theory and practice of markup languages
for the representation of textual and other data, such as SGML
and the Text Encoding Initiative. Taught by Lou Burnard,
Judith Klavans, and C. M. Sperberg-McQueen.

COURSE DESCRIPTION:
The representation of textual data has raised serious
problems since the early days of digital technology.
Incompatibility between representations range from simple
formatting issues, such as word delimitation, to data encoding
schemes, such as 7-bit encoding for English, 8-bit for
accented languages, up to 32-bit for Asian languages.
Furthermore, the complications seem to be growing as the
amount of digital data increases. Recognizing the predicament
these complications cause in the information age, a group of
researchers and practitioners, sponsored by the Association
for Computational Linguistics, the Association for Computers
and the Humanities, and the Association for Literary and
Linguistic Computing, joined in 1988 to explore ways to
resolve the serious emerging incompatibilities in the
representation of text. The Text Encoding Initiative has
addressed these problems by developing detailed SGML Document
Type Definitions (DTDs) to achieve comprehensive and
generalizable encoding standards for a range of data types,
from verse to syntactic analyses, from spoken language to
hypertext, from terminological data to multilingual corpora.

This one-day course will consist of three parts: the first
will describe the challenges raised by the three ``abilities''
which concern effective text representation: reusability,
interchangeability, and compatibility. The next section of
the course will present the types of data handled so far by
the TEI encoding scheme, some of the problems already solved,
some ongoing projects, and some unsettled questions. If
hands-on is possible, we will provide a session to experience
the strengths of using the TEI for building intelligent text
data bases from existing on-line texts. Otherwise, we will
demonstrate widely available software and discuss practical
issues in using the TEI for building intelligent text data
bases from existing on-line texts.

The course will be of interest to: computer scientists who
are building large test-beds of textual data, researchers who
must analyze and encode representational systems over such
data, practitioners who must solve the incompatibility problem
by choosing a standard encoding scheme for textual data, SGML
hackers who want to know more about TEI DTDs, humanists who
want to learn more about the issues in text representation.
Since most of IR currently operates over textual data, the
indexing issues in the TEI are of particular and pressing
interest to the IR audience.

Further information can be found at:
http://www.columbia.edu/~klavans/home.html
http://www-tei.uic.edu/pub/tei/sigir.html
Questions re workshop content should be directed to C.M.
Sperberg-McQueen, u35395@uicvm.cc.uic.edu; addresses for
queries re registration and accommodation are given below.

MATERIALS AND PRESENTERS
All participants will be provided with a printed
introductory summary guide to the TEI scheme and supporting
materials on PC disks, including full versions of the TEI
DTDs, public domain SGML software and sample TEI texts. The
electronic version of the Guidelines will also be provided.

Lou Burnard, of Oxford University Computing Services, is
the European editor of the TEI project. He has degrees in
English literature from Oxford, and has worked in computers
since the seventies. His areas of expertise are in the
applications of computing to linguistic and literary research,
particularly with reference to database and text retrieval
systems. He has published and lectured widely on these and
related topics. His present responsibilities, aside from TEI
work, include management of the British National Corpus
project at OUCS, and the Oxford Text Archive, of which he is
Director.

Judith Klavans is the Director of the Center for Research
on Information Access (CRIA) at Columbia University. The
goals of the Center, established in January 1995, are to
integrate and coordinate the various digital library related
activities at Columbia University, to push forward research on
technologies related to information access, and to serve as a
source of information on the technological aspects of digital
library applications to external projects. Dr. Judith Klavans
has a research career which combines aspects of computer
science and linguistics, including the automatic acquisition
of lexical knowledge, multilingual text analysis, and the
development of symbolic techniques for the presentation of
information within the context of digital libraries.

C. M. Sperberg-McQueen is a senior research programmer at
the academic computer center at the University of Illinois at
Chicago; he currently works in the database group, on SGML
applications and the university library's information arcade.
Since 1988 he has been editor in chief of the ACH/ACL/ALLC
Text Encoding Initiative.

REGISTRATION:
Cost of the course is $50 before May 29 and $65 after May
29 which includes a box lunch and course documentation. The
attached registration form covers this course only.

Attendance at SIGIR '95 is not required for this course.
Those wishing to attend SIGIR as well should complete the
separate SIGIR registration form; a copy plus full information
on SIGIR '95, including descriptions of tutorials, workshops,
all technical sessions, and accommodation, etc. is available
from ftp.u.washington.edu (\public\sigir95\program) by
anonymous ftp; or via WWW at URL: http://info.sigir.acm.org/
sigir/conferences/SIGIR_95_adv.pgm.html; or request a copy of
the program by mail by contacting sigir95@u.washington.edu.

The course venue will depend on enrolment but at present it
is expected that it will be at the SIGIR conference hotel, the
Seattle Sheraton Hotel & Towers, 1400 Sixth Avenue, Seattle,
WA 98101. Details of conference accomodation are available
from the ftp and www addreses above.

Cut here: >--------------------------------------------------

SGML/TEI COURSE REGISTRATION FORM
in conjunction with SIGIR '95
Seattle, WA, USA, July 8, 1995

Please use block letters or type, and tick where appropriate

__ Mr. __ Ms. __ Dr. __ Prof. Other: ______

LAST NAME:________________ FIRST NAME:_______________________

BADGE NAME (if different): __________________________________

COMPANY/ORGANIZATION:________________________________________

ADDRESS:_____________________________________________________

CITY:__________________ STATE:______ ZIP CODE: __________

COUNTRY:_______________ PHONE: ( ___ )____________________

FAX: ( ___ ) _______________ EMAIL: ________________________

COURSE REGISTRATION FEE:
$50 prior to May 29; $65 after May 29) $ ________________

DO YOU HAVE ANY SPECIAL NEEDS? Please explain:
___________________________________________________________

ARE YOU ALSO ATTENDING SIGIR '95? ____ yes ____ no

METHOD OF PAYMENT (US Currency only):

__ Check payable to ACM/SIGIR95
__ Credit card (Visa, MC, AMEX)
____________________________________
Credit card number, expiration date

______________________________________
Signature, date
(I authorize to charge my account fees indicated above)

Return Registration Form by May 29 to qualify for early
registration. Use fax or email (credit card payment) or mail
check or credit card) to:
SIGIR95
c/o Convention Services Northwest
1809 Seventh Avenue, Suite 1414
Seattle, WA 98101 USA
Fax: +1 206-292-0559
Email: SIGIR95@aol.com
(Registration queries to: +1 206-292-9198 (Ask for Sarah
Amendola)
______________________________________________________________