[Corpora-List] MEMURA 2004 - Call For Participation

From: ddg@di.ubi.pt
Date: Wed May 05 2004 - 15:38:56 MET DST

  • Next message: Patrick Demasco: "[Corpora-List] British to American Spelling"

    ******************CALL FOR PARTICIPATION******************

    Workshop on Methodologies and Evaluation of Multiword Units
                      in Real-world Applications
                                                                                                    (MEMURA
    2004
    Workshop)

             (in association with the 4th INTERNATIONAL
          CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION)

             Centro Cultural de Belém, Lisbon, Portugal
                           May 25, 2004

                    http://memura2004.di.ubi.pt

                         INVITED SPEAKER

                        Dr. Kenneth Churh

    **********************************************************

    [1] Workshop Description
    [2] Target Audience
    [3] Programme
    [4] Contact

    [1] Workshop Description:
    ------------------------

    Multiword units (MWUs) include a large range of linguistic phenomena, such
    as phrasal verbs (e.g. "look forward"), nominal compounds (e.g. "interior
    designer"), named entities (e.g. "United Nations"), set phrases (e.g. "con
    carne") or compound adverbs (e.g. "by the way"), and they can be
    syntactically and/or semantically idiosyncratic in nature. MWUs are used
    frequently in everyday language, usually to express precisely ideas and
    concepts that cannot be compressed into a single word. A considerable amount
    of research has been devoted to this subject, both in terms of theory and
    practice, but despite increasing interest in idiomaticity within linguistic
    research, many questions still remain unanswered. The objective of this
    workshop is to deal with three important questions that are of great
    interest for real-world applications.

    1) Comparison of MWU extraction methodologies

    Many methodologies have been proposed in order to automatically extract or
    identify MWUs. However, not many efforts have been devoted to compare their
    results. The core differences between the methodologies is certainly the
    main reason why such works are so rare. For instance, it is not easy to
    compare language-dependent methodologies as the results depend on the
    efficiency of parameter tuning in the broad sense of its acception (i.e.
    semantic tagging, local specific grammars, lematization, part-of-speech
    tagging etc.). Another important problem is the fact that there is no real
    agreement between researchers about the definition of MWUs which would
    provide the basis for an objective evaluation. The objective of the workshop
    is to gather people that have recently been working in this area so that new
    trends in comparing MWU extraction methodologies and their evaluation can be
    pointed at.

    2) Evaluation of the benefits of the integration of MWUs in real-world
    applications

    It is not yet clear whether MWUs really improve NLP applications. It is
    common sense that Machine Translation is one application that takes great
    advantage of MWUs databanks. However, does the same apply to applications in
    Automatic Summarization, Information Retrieval (IR), Cross-language IR,
    Information Extraction, Text Clustering/Classification, Parallel Corpus
    Alignment? Indeed, could the identification of MWUs introduce new
    constraints that are not present in original texts? Should MWUs be
    considered as units that should not be analysable in terms of their
    components meaning? Or should they be treated as unanalysable? Should NLP
    methods work both on isolated words and on agregated MWUs?

    The answers are anything but clear. Here, the objective of the workshop is
    to point at successes and failures of the integration of MWUs in real-world
    applications.

    3) Comparison of scalable architectures for the extraction and
    identification of MWUs

    Real-world applications are constrained by variables like processing time
    and memory space. However, identifying and extracting MWUs is usually a
    computationally heavy process. In recent years, new algorithms and new
    technologies have been proposed to introduce MWU treatmement in large scale
    applications, thus avoiding previous untractable implementations. Previous
    workshops on MWUs have mainly focused on the unconstrained extraction
    process. In this workshop, we would like to focus on the comparison of
    different factors that can influence the scalability of the treatment of
    MWUs in real-world applications, namely data structures, algorithms,
    parallel and distributed computing, grid computing etc. Indeed, as we said
    earlier, some extraction strategies may not scale to deal with huge volumes
    of data.

    [2] Target Audience:
    --------------------

    This workshop is intended to bring together NLP researchers working on all
    areas of MWUs. The objective is to summarise what has been achieved in the
    area of MWU in real-world applications, to establish common themes between
    different approaches, and to discuss future trends.

    [3] Programme:
    --------------

    9h00 - 9h45 - Invited Speaker - Kenneth W. Church

    9h45 - 10h05 - Japanese Multiword Extraction using SVM and Adaptation - T.
    Ogata, K. Terao, K. Umemura - Toyohashi University of Technology - Japan

    10h05 - 10h25 - Multiword Expressions Recognition with the LVQ Algorithm -
    M.C. Díaz-Galiano, M.T. Martín-Valdivia, F. Martínez-Santiago, L.A.
    Ureña-López - University of Jaén - Spain

    10h25 - 10h45 - A Parallel Multikey Quicksort Algorithm for Mining Multiword
    Units - R. Pereira, P.Crocker, G.Dias - Beira Interior University - Portugal

    10h45 - 11h00 - Coffee Break

    11h00 - 11h20 - Recognition and Paraphrasing of Periphrastic and Overlapping
    Verb Phrases - N. Kaji, S.Kurohashi - University of Tokyo - Japan

    11h20 - 11h40 - Transducing Text to Multiword Units - C.H.A. Koster -
    University of Nijmegen - The Netherlands

    11h40 - 12h00 - Multiword Units in Syntactic Parsing - J. Nivre and J.
    Nilsson - Växjö University - Sweden

    12h00 - 12h20 - Use of Noun Phrases in Interactive Search Refinement - O.
    Vechtomova, M. Karamuftuoglu - University of Waterloo - Canada

    12h20 - 12h40 - Comparative Evaluation of C-value in the Treatment of Nested
    Terms - S. Vintar - University of Ljubljana - Slovenia

    12h40 - 13h00 - Discussion and Closing Session

    [4] Contact
    -----------

    Gaël Dias
    Human Language Technology Interest Group
    Departamento de Informática
    Universidade da Beira Interior
    Rua Marquês d'Ávila e Bolama
    6201-001 Covilhã Portugal
    email: ddg@di.ubi.pt
    Tel: +351 275319700 - Mob: +351 918612700 - Fax: +351 275319 732



    This archive was generated by hypermail 2b29 : Wed May 05 2004 - 15:52:43 MET DST