Corpora: DEADLINE EXTENSION: ACL-2001 Workshop on Evaluation Methodologies for Language & Dialogue Systems

From: Priscilla Rasmussen (rasmusse@cs.rutgers.edu)
Date: Wed Apr 11 2001 - 17:40:40 MET DST

  • Next message: Priscilla Rasmussen: "Corpora: 2nd International Workshop on Spanish Language Processing & Language Technologies"

    [ Extended submission deadline: **22 April**]

    Call for Papers

    Workshop on Evaluation Methodologies for Language and Dialogue Systems
    ACL/EACL 2001
    Toulouse, France
    July 6-7, 2001

    WORKSHOP GOALS

    The aim of this two day workshop is to identify and to synthesize
    current needs for language-technology evaluation.

    The first day of the workshop will focus on one of the most challenging
    current issues in language engineering: the evaluation of dialogue
    systems and models. The second day will extend the discussion to address
    the problem of evaluation in language engineering more broadly and on
    more theoretical grounds.

    The space of possible dialogues is enormous, even for limited domains
    like travel information servers. The generalization of evaluation
    methodologies across different application domains and languages is an
    open problem. Review of published evaluations of dialogue models and
    systems suggests that usability techniques are the standard method.
    Dialogue-based system are often evaluated in terms of standard,
    objective usability metrics, such as task-completion time and number of
    user actions. In the past, researchers have proposed and debated
    theory-based methods for modifying and testing the underlying dialogue
    model, but the most widely used method of evaluation is usability
    testing, although more precise and empirical methods for evaluating the
    effectiveness of dialogue models have been proposed. For task-based
    interaction, typical measures of effectiveness are time-to-completion
    and task outcome, but the evaluation should focus on user satisfaction
    rather than on arbitrary effectiveness measurements.Indeed, the problems
    faced in current approaches to measurement of effectiveness dialogue
    models and systems include:

    Direct measures are unhelpful because efficient performance on the
    nominal task may not represent the most effective interaction
    Indirect measures usually rely on judgment and are vulnerable to weak
    relationships between the inputs and outputs
    Subjective measures are unreliable and domain-specific
    For its first day, the workshop organizers solicit papers on these
    issues, with particular emphasis on methods that go beyond usability
    testing to address the underlying dialogue model. Representative
    questions to be addressed include:

      o How do we deal with the combinatorial explosion of dialogue states?
      o How can satisfaction be measured with respect to underlying dialogue
    models?
      o Are there useful direct measures of dialogue properties that do not
    depend on task efficiency?
      o What is the role of agent-based simulation in evaluation of dialogue
    models?

    Of course, the problems faced in evaluating dialogue and system models
    are found in other domains of language engineering, even for
    non-interactive processes such as part-of-speech tagging, parsing,
    semantic disambiguation, information extration, speech transcription,
    and audio document indexing. So the issue of evaluation can be viewed at
    a more generic level, raising fundamental, theoretical questions such
    as:

      o What are the interest and benefits of evaluation for language
    engineering?
      o Do we really need these specific methodologies, since a form of
    evaluation sould always be present in any scientific investigation?
      o If evaluation is needed in language engineering, is it the case for
    all domains?
      o What form should it take? Technology evaluation (task-oriented in
    laboratory environment) or field/user Evaluation (complete systems in
    real-life conditions)?

    We have seen before that the the evaluation of dialogue models is still
    unsolved, but for domains where metrics already exists, are they
    satisfactory and sufficient? How can we take into account or abstract
    from the subjective factor introduced by human operators in the process?
    Do similarity measures and standards offer appropriate answers to this
    problem? Most of the efforts focus on evaluating process, but what about
    the issue of language resources evaluation?

    For its second day of work, the workshop organizers solicit papers on
    these issues, with the intent to address the problem of evaluation both
    from a broader perspective (including novel applications domains for
    evaluation, new metrics for known tasks and resource evaluation) and a
    more theoretical point of view (including formal theory of evaluation
    and infrastructural needs of language engineering).

    NOTE: People who would like to submit a paper on lexical semantic
    disambiguation evaluation should consider the parallel workshop, on July
    5-6, for the closure of the SENSEVAL-2 evaluation campaign.

    -------------------------------------------------------------

    WORKSHOP ORGANIZATION

    The organization of each of the two days of the workshop will reflect
    the workshop's two main themes. Each day will begin with a session of
    presentations of selected papers and follow with panel discussions to
    synthesize and develop possible methodologies from additional selected
    workshop papers.

    WORKSHOP PARTICIPATION

    The workshop seeks participation from people involved or interested in
    the problem of evaluation in language processing and the research and
    industrial communities that study and implement dialogue models for
    natural-language interaction systems.

    The first part of the workshop will specifically draw on the
    natural-language interaction community, for instance like the one
    developing at the confluence of SIGdial and SIGCHI, which will find in
    this workshop an atmosphere more flavored by computational-linguistics
    related issues (see, for example, the First SIGdialWorkshop on Discourse
    and Dialogue).

    The second part of the workshop is intended to provide a forum for a
    broader audience more in the spirit of the one that attended the
    LREC'2000 Satellite Workshop on Evaluation (see
    http://www.limsi.fr/TLP/CLASS), in particular offering an opportunity to
    people involved in language engineering evaluation (e.g ., the CLASS
    audience) in the context of national or transnational projects or
    programs, both in Europe and abroad.

    -------------------------------------------------------------

    SUBMISSION DETAILS

    Paper submissions should follow the two-column format of ACL proceedings
    and should not exceed eight (8) pages, including references. We strongly
    recommend the use of ACL LaTeX style files or Microsoft Word Style files
    tailored for this year's conference. They are available from the
    ACL-2001 program committee Web site at http://acl2001.dfki.de/style/.

    Papers should be submitted electronically, as either a LaTeX, Word or
    PDF file to either:

    Patrick Paroubek, pap@limsi.fr
    Karen Ward, kward@cs.utep.edu

    -------------------------------------------------------------

    TIMETABLE OF IMPORTANT DATES

    Deadline for workshop paper submissions: **April 22, 2001**
    Deadline for notification of workshop paper acceptance: May 6, 2001
    Deadline for camera-ready workshop papers: May 16, 2001
    Workshop date: July 6-7, 2001

    -------------------------------------------------------------

    WORKSHOP ORGANIZING COMMITTEE

    David G. Novick, UTEP
    novick@cs.utep.edu
    http://www.cs.utep.edu/novick

    Joseph Mariani, Limsi - CNRS
    mariani@limsi.fr
    http://www.limsi.fr/Individu/mariani

    Candy Kamm, AT&T Labs
    cak@research.att.com
    http://www.research.att.com/info/cak

    Patrick Paroubek, Limsi - CNRS
    pap@limsi.fr
    http://www.limsi.fr/Individu/pap

    Nils Dahlbäck, Linköping University
    nilda@ida.liu.se
    http://www.ida.liu.se/~nilda/

    Frankie James, NASA Ames Research Center
    fjames@riacs.edu
    http://www-pcd.stanford.edu/frankie/

    Karen Ward, UTEP, kward@cs.utep.edu
    http://www.cs.utep.edu/kward

    -------------------------------------------------------------

    SCIENTIFIC COMMITTEE

    David G. Novick
    Joseph Mariani
    Candy Kamm
    Patrick Paroubek
    Nils Dahlbäck
    Frankie James
    Karen Ward
    Christian Jacquemin
    Niels Ole Bernsen
    Stephane Chaudiron
    Khalid Choukri
    Martin Rajman
    Robert Gaizauskas
    Donna Harman
    Lynette Hirschman (tentative)
    David Pallett (tentative)
    Carol Peters (tentative)
    Jose Pardo (tentative)
    Herman Steeneken (tentative)
    Oliviero Stock (tentative)
    Saïd Tazi
    Hans Uszkoreit (tentative)

    -------------------------------------------------------------

    SPONSORS

     ACL 2001
     CLASS
     ELRA
     ELSNET
     SIGdial

    -------------------------------------------------------------

    ADDITIONAL INFORMATION

    Additional information on the workshop, including accepted papers and
    the workshop schedule, will be made available as needed at
    http://www.limsi.fr/TLP/CLASS/eacl01.html



    This archive was generated by hypermail 2b29 : Thu Apr 12 2001 - 23:52:01 MET DST