[Corpora-List] MEMURA 2004 - Call For Participation

From: ddg@di.ubi.pt
Date: Wed May 05 2004 - 15:38:56 MET DST

Next message: Patrick Demasco: "[Corpora-List] British to American Spelling"

Previous message: José del Río Can: "RE: [Corpora-List] information about Corpus non Standard"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

******************CALL FOR PARTICIPATION******************

Workshop on Methodologies and Evaluation of Multiword Units
in Real-world Applications
(MEMURA
2004
Workshop)

(in association with the 4th INTERNATIONAL
CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION)

Centro Cultural de Belém, Lisbon, Portugal
May 25, 2004

http://memura2004.di.ubi.pt

INVITED SPEAKER

Dr. Kenneth Churh

**********************************************************

[1] Workshop Description
[2] Target Audience
[3] Programme
[4] Contact

[1] Workshop Description:
------------------------

Multiword units (MWUs) include a large range of linguistic phenomena, such
as phrasal verbs (e.g. "look forward"), nominal compounds (e.g. "interior
designer"), named entities (e.g. "United Nations"), set phrases (e.g. "con
carne") or compound adverbs (e.g. "by the way"), and they can be
syntactically and/or semantically idiosyncratic in nature. MWUs are used
frequently in everyday language, usually to express precisely ideas and
concepts that cannot be compressed into a single word. A considerable amount
of research has been devoted to this subject, both in terms of theory and
practice, but despite increasing interest in idiomaticity within linguistic
research, many questions still remain unanswered. The objective of this
workshop is to deal with three important questions that are of great
interest for real-world applications.

1) Comparison of MWU extraction methodologies

Many methodologies have been proposed in order to automatically extract or
identify MWUs. However, not many efforts have been devoted to compare their
results. The core differences between the methodologies is certainly the
main reason why such works are so rare. For instance, it is not easy to
compare language-dependent methodologies as the results depend on the
efficiency of parameter tuning in the broad sense of its acception (i.e.
semantic tagging, local specific grammars, lematization, part-of-speech
tagging etc.). Another important problem is the fact that there is no real
agreement between researchers about the definition of MWUs which would
provide the basis for an objective evaluation. The objective of the workshop
is to gather people that have recently been working in this area so that new
trends in comparing MWU extraction methodologies and their evaluation can be
pointed at.

2) Evaluation of the benefits of the integration of MWUs in real-world
applications

It is not yet clear whether MWUs really improve NLP applications. It is
common sense that Machine Translation is one application that takes great
advantage of MWUs databanks. However, does the same apply to applications in
Automatic Summarization, Information Retrieval (IR), Cross-language IR,
Information Extraction, Text Clustering/Classification, Parallel Corpus
Alignment? Indeed, could the identification of MWUs introduce new
constraints that are not present in original texts? Should MWUs be
considered as units that should not be analysable in terms of their
components meaning? Or should they be treated as unanalysable? Should NLP
methods work both on isolated words and on agregated MWUs?

The answers are anything but clear. Here, the objective of the workshop is
to point at successes and failures of the integration of MWUs in real-world
applications.

3) Comparison of scalable architectures for the extraction and
identification of MWUs

Real-world applications are constrained by variables like processing time
and memory space. However, identifying and extracting MWUs is usually a
computationally heavy process. In recent years, new algorithms and new
technologies have been proposed to introduce MWU treatmement in large scale
applications, thus avoiding previous untractable implementations. Previous
workshops on MWUs have mainly focused on the unconstrained extraction
process. In this workshop, we would like to focus on the comparison of
different factors that can influence the scalability of the treatment of
MWUs in real-world applications, namely data structures, algorithms,
parallel and distributed computing, grid computing etc. Indeed, as we said
earlier, some extraction strategies may not scale to deal with huge volumes
of data.

[2] Target Audience:
--------------------

This workshop is intended to bring together NLP researchers working on all
areas of MWUs. The objective is to summarise what has been achieved in the
area of MWU in real-world applications, to establish common themes between
different approaches, and to discuss future trends.

[3] Programme:
--------------

9h00 - 9h45 - Invited Speaker - Kenneth W. Church

9h45 - 10h05 - Japanese Multiword Extraction using SVM and Adaptation - T.
Ogata, K. Terao, K. Umemura - Toyohashi University of Technology - Japan

10h05 - 10h25 - Multiword Expressions Recognition with the LVQ Algorithm -
M.C. Díaz-Galiano, M.T. Martín-Valdivia, F. Martínez-Santiago, L.A.
Ureña-López - University of Jaén - Spain

10h25 - 10h45 - A Parallel Multikey Quicksort Algorithm for Mining Multiword
Units - R. Pereira, P.Crocker, G.Dias - Beira Interior University - Portugal

10h45 - 11h00 - Coffee Break

11h00 - 11h20 - Recognition and Paraphrasing of Periphrastic and Overlapping
Verb Phrases - N. Kaji, S.Kurohashi - University of Tokyo - Japan

11h20 - 11h40 - Transducing Text to Multiword Units - C.H.A. Koster -
University of Nijmegen - The Netherlands

11h40 - 12h00 - Multiword Units in Syntactic Parsing - J. Nivre and J.
Nilsson - Växjö University - Sweden

12h00 - 12h20 - Use of Noun Phrases in Interactive Search Refinement - O.
Vechtomova, M. Karamuftuoglu - University of Waterloo - Canada

12h20 - 12h40 - Comparative Evaluation of C-value in the Treatment of Nested
Terms - S. Vintar - University of Ljubljana - Slovenia

12h40 - 13h00 - Discussion and Closing Session

[4] Contact
-----------

Gaël Dias
Human Language Technology Interest Group
Departamento de Informática
Universidade da Beira Interior
Rua Marquês d'Ávila e Bolama
6201-001 Covilhã Portugal
email: ddg@di.ubi.pt
Tel: +351 275319700 - Mob: +351 918612700 - Fax: +351 275319 732

Next message: Patrick Demasco: "[Corpora-List] British to American Spelling"
Previous message: José del Río Can: "RE: [Corpora-List] information about Corpus non Standard"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed May 05 2004 - 15:52:43 MET DST