International Workshop on Industrial Parsing of Software Manuals 1995

Richard Sutcliffe (sutcliffer@ul.ie)
Mon, 27 Feb 95 16:36:05 GMT

Second Call for Participants

I P S M ' 9 5

International Workshop on

Industrial Parsing of Software Manuals 1995

Thursday 4 - Friday 5 May 1995

University of Limerick

Ireland

Sponsored by the European Union

Organisers:

Heinz-Detlev Koch, University of Heidelberg
Richard F. E. Sutcliffe, University of Limerick

Introduction
============

A considerable potential market exists for robust systems which can perform
free text information retrieval, machine assisted translation, summarisation,
routing and related tasks on pre-existing documents. An important component in
many systems of this kind is a parser which will allow grammatical (and hence
semantic) relations between textual units to be determined.

There is a large literature on parsing but only recently has much emphasis been
placed upon robust coverage of real texts rather than the fragile analysis of
simple example sentences. Competitions such as the Text Retrieval Conference
(TREC) and the Message Understanding Conference (MUC) have provided a useful
focus on practical considerations by concentrating on the optimisation of
overall system performance in a specific task which is shared by all
participants. Our aim is to concentrate on parsing alone and to undertake a
more detailed comparison of approaches.

Discussion and experimentation will revolve around our chosen text domain - the
instruction manuals shipped with computer software for the PC. This is an
important area because there is an increasing convergence between the
development of accompanying documentation and the provision of on-line help. In
addition there is a need to automate the complex process of product
localisation. One aspect of this process is the semi-automatic translation of
accompanying documents into the language of each target market, a task which is
greatly facilitated if an accurate syntactic analysis of the texts is
available.

It is not our intention to compare the literal characteristics of different
parsing algorithms as this topic is very well covered already. Rather we wish
to investigate the efficacy of competing parsing *systems*, each comprising a
parser and grammar taken together. What exactly are the difficulties faced in
trying to analyse utterances derived from the domain of instruction manuals,
and what are the strengths and weaknesses of the various parsing methods when
applied to it? The following general issues are of interest:

* How easy is it to develop a grammar for the parser?

* What existing resources exist for the parser and how easy are they to modify
for new purposes?

* What type of data structures are returned by the parser?

* What information do these data structures structures contain and how easy is
it to extract that information?

Organisation of Workshop
========================

On 27 February 1995, each participating group will be supplied with three
technical texts in machine readable form together with three lists of
terminology occurring in these text. Each text comprises 200 utterances.
The documents being used are:

* the Dynix Automated Library Systems Searching Manual

* the Lotus Ami Pro for Windows User's Guide Release Three

* the Trados Translator's Workbench for Windows User's Guide

(Permission has kindly been granted by the copyright holders of these extracts
enabling them to be used for the purposes of the IPSM'95 workshop.)

Each group has ten weeks in which to carry out three experiments on the above
documents, namely Analysis I, Analysis II and Analysis III. Essentially
Analysis I involves trying the parser on the sentences while only altering the
lexical analysis component. Under Analysis II groups are permitted to add to
the lexicon or terminological database of their parser as necessary, using our
list of compounds as a starting point. Within Analysis III (optional) it is
permitted to alter the underlying grammar or parsing algorithms.

The objective within each phase it to carry out an analysis of the resulting
parse trees with a view to establishing the efficacy of the parsing system with
respect to the following kinds of issues:

* The resolution of prepositional phrase attachment - how accurately can this
be accomplished in this domain?

* The analysis of and/or/comma coordination - what characteristics do such
coordinations have in this domain and what techniques can be used to analyse
them accurately?

* The handling of gapping and other forms of ellipsis - to what extent is the
best approach to this problem dependent on the parsing system being used?

* Efficiency - how long does it take to analyse the sentences and what space
requirements are entailed?

Finally, participants will meet in Limerick in May to compare results and to
discuss their implications. Proceedings of the workshop will be produced as a
UL Technical Report and it is planned to publish this after the meeting.

Important Dates
===============

27.02.95 Test Materials available to participants
21.04.95 Parse trees to be submitted are announced
27.04.95 Printouts of parse trees received at both Heidelberg and Limerick
27.04.95 Articles for proceedings received at Limerick
04.05.95 Workshop at Limerick, Day One
05.05.95 Workshop at Limerick, Day Two
30.06.95 Preliminary fair copy of articles received at Limerick

Participation
=============

The following have already agreed to take part in the workshop:

Lingware Ltd., Szeged, Hungary
National University of Singapore
Rank Xerox Research Centre, Grenoble, France
University College Dublin, Ireland
University of Heidelberg, Germany
University of Helsinki, Finland
University of Limerick, Ireland
University of Manitoba, Canada
University of Nijmegen, The Netherlands
University of Pennsylvania, USA

Proposals for participation are welcomed by other groups. Detailed instructions
together with the test materials are now available. Please contact the
organisers for further information.

The above groups all use systems consisting of a parser together with either a
grammar or a syntagmatic lexicon. However we are happy to include teams who
work with statistical or connectionist methods.

Natural Language Processing at UL
=================================

The NLP group at Limerick has interests in computational lexicology and
lexicography, computer assisted language learning, information retrieval,
machine assisted translation, software localisation, syntagmatic parsing and
word sense disambiguation.

The group has received sponsorship from a number of sources including Apple
Computer, the European Union, EOLAS, the Industrial Development Board for
Northern Ireland, K & M, Microsoft, the National Software Directorate and
Siemens.

Previous conferences organised at UL include Artificial Intelligence and
Cognitive Science 1992, Eagna chun Gnimh (Irish Computational Linguistics),
Natural Language Processing in Ireland 1992 (NLPI'92), NLPI'93, and
International Workshop on Machine Translation 1994.

About the University of Limerick
================================

The University of Limerick is situated on rolling parkland beside the historic
River Shannon. The tranquil red brick campus is complemented by lakes,
fountains and sculptures.

The workshop will be based in the Robert Schuman Building, home of the Computer
Science and Information Systems Department, an airy and majestic structure
incorporating lecture theatres. meeting rooms, computer laboratories and
research centres, all equipped to the highest standards.

The Shannon Region is especially beautiful in early summer, and there are many
sights worth visiting: Bunratty Castle, the thousand foot high Cliffs of Moher,
Rock of Cashel, King John's Castle, and Aillwee Caves are all close by. In
addition there is Loch Derg, teeming with wildlife on its many islands and
surrounded by blue-green hills and lofty mountains.

The University is easily reached - Shannon International Airport is only 20kms
away, with regular direct flights to London, New York, and many other cities.

Heinz-Detlev Koch
Lehrstuhl fuer Computerlinguistik
KarlstraBe 2
D-69117 Heidelberg
Deutschland

+49 6221 543 248 Direct
+49 6221 543 242 Fax

koch@novell1.gs.uni-heidelberg.de

Richard F.E. Sutcliffe
Department of Computer Science
and Information Systems
University of Limerick
Ireland

+353 61 202706 Direct
+353 61 330876 Fax

email sutcliffer@ul.ie

+-------------EV------------+
| +--------J-------+ |
| | +------DP-----+ | +-------O-------+
+-S+--AI--+--EV-+ | +--MP--+ +----M---+ +---D---+
| | | | | | | | | | |
We are grateful to the European Union for sponsoring.v the workshop.n

(Tree produced using the Link Parser of Sleator and Temperley)