Corpora: LREC WORKSHOP ANNOUNCEMENT

Simone Saint Laurent (lrec@ilc.pi.cnr.it)
Fri, 30 Jan 1998 11:32:21 +0100

*We apologize for multiple copies*

WORKSHOP ANNOUNCEMENT AND CALL FOR PAPERS

LINGUISTIC COREFERENCE WORKSHOP
26 May 1998, Morning Session

Held in conjunction with
The First International Conference on Language Resources and Evaluation
Granada, Spain (28-30 May 1998)

WORKSHOP AIMS

It is essential, for a natural language processing system, to instantiate each
object, process, attribute, and property correctly, so that all references to
the same item be recognized as such and an inventory of all distinct items be
accurate at all times. This problem is far from being resolved. There are both
linguistic and computational reasons for this deficiency. First, there is no
satisfactory microtheory of linguistic coreference. Secondly and
consequently, there is no satisfactory application of such a microtheory to
NLP.

A microtheory of coreference in natural language includes in its scope all the
phenomena that satisfy the following condition: an object/entity, an event, an
attribute, a property or its value, an attitude, or any combination of the
above
is referred to more than once in a natural-language text, and the understanding
of the text depends on the correct interpretation of the two or more referring
expressions as designating the same object, event, etc. A linguistic
microtheory of coreference for a language consists of the following elements:
- a complete range of covered phenomena in the language;
- a taxonomy of the range;
- a typology of the range;
- a list of rules forming the various types of coreference;
- a list of rules interpreting the various types of coreference.

There has been a considerable amount of work on a few selected types of
coreference, focusing almost exclusively on object coreference. Thus,
significant work has been done in theoretical linguistics on anaphora and
cataphora, subsuming, for the large part, earlier work on deixis. A small
minority of authors have tried to extend their studies of anaphora beyond mere
syntax. In the cognitive-linguistics and philosophy-of-language traditions,
interesting work has been done relating anaphora and deixis to ambiguity
resolution and discourse structure. At the same time, an effort in
comparative-contrastive linguistics has led some writers to examining the data
of more than one language at a time, still emphasizing entity or object
reference.

In computational linguistics, the problem of coreference took early on the form
of pronoun antecedent resolution, and this particular task, somewhat broadened
to include a few other types of anaphora, still remains in the center of the
problem. The most sustained effort in the computational treatment of
coreference
has been mounted within the Tipster/MUC-6 initiative. While it has been
recognized since quite early in the game that coreference resolution is
based in
large part on world knowledge, most of the work done on the matter
computationally and theoretically ignores and avoids world
knowledge. The MUC-6 initiative makes such an orientation quite explicit: the
work should be based on such simpler resources as part-of-speech tagging,
simple
noun phrase recognition, basic semantic category information like, gender,
number, and [to a limited extent] full parse trees. Such an approach--trying to
explore and maximize everything that can be done simply and cheaply towards the
resolution of a complex program--is perfectly legitimate as long as it is
realized that a considerable part of the problem remains unsolved, and it is
indeed realized fully well within the MUC-6 initiative.

One persistent problem throughout the existing computational ventures into
coreference has been the lack of a consistent theoretical approach to it. The
result is that coreference phenomena are treated as self-obvious, and most of
them are overlooked, especially if they are not explicit pronoun-antecedent or
other equally evident anaphora cases. What is needed for a full, accurate, and
reliable approach to coreference can be summarized, somewhat schematically, as
involving the following steps:

1. understanding fully the range of the phenomenon and
of the rules that govern it (theory);
2. determining the extent of machine-tractable information
in the rules;
3. taking stock of all the rules that can be computed;
4. developing the appropriate heuristics for the computable rules;
5. computing the rules.

WORKSHOP AGENDA

The workshop will be held during the morning session of 26 May 1998 and will
include a joint address by the Organizing Committee (listed above), followed by
5-8 individual presentations in two 90-120-minute blocks, with a break provided
midway through.

CALL FOR PAPERS

The Workshop solicits papers addressing any one or more of the points addressed
above as well as any other pertinent issues.

Papers based on a diversity of languages are encouraged, both one language at a
time and, especially, comparative/contrastive studies. Also strongly encouraged
are papers which extend the study of coreference beyond entity/object
reference,
across document boundaries, and/or into non-text media.

FORMAT FOR SUBMISSION

Paper submissions should consist of an extended abstract of approximately 800
words, along with a brief description of the proposed presentation structure
(e.g., paper, paper plus demo,etc.).

Each submission should include a separate title page, providing the following
information: the title to be printed in the Conference program; names and
affiliations of all authors; the full address of the primary author (or
alternate contact person), including phone, fax, email; and required
audio-visual equipment.

Papers may be submitted by sending three hardcopies or one softcopy (in TeX,
ASCII, or post-script format) to the appropriate address as listed below:

Dr. Victor Raskin
Chair, Interdepartmental Program in Linguistics
Heavilon Hall
Purdue University
West Lafayette, IN 47907 USA

vraskin@purdue.edu

Submissions must be received no later than 1 March 1998 for a 15 March
notification of paper acceptance. (Full versions of all accepted papers are
requested no later than 15 April 1998 for inclusion in the conference
proceedings.)

WORKSHOP ORGANIZING COMMITTEE

Dr. Sara J. Shelton (Contact Person)
US Department of Defense
9800 Savage Road, R525
Ft Meade, MD 20755 USA
sjshelt@afterlife.ncsc.mil
301-688-0301 (voice)
301-688-0338 (fax)

Dr. Eduard Hovy
Information Sciences Institute
University of Southern California
4676 Admirality Way
Marina Del Rey, CA 90292-669 USA
hovy@isi.edu
310-822-1511, ext. 731 (voice)

Dr. Victor Raskin
Interdepartmental Program in Linguistics
Heavilon Hall
Purdue University
West Lafayette, IN 47907 USA
vraskin@purdue.edu
765-494-3782 (voice)
765-494-3780 (fax)