Corpora: LREC Workshop on meta-descriptions and annotation schemes for multimedia Language Resources

From: Hamish Cunningham (hamish@dcs.shef.ac.uk)
Date: Wed Jan 19 2000 - 16:22:09 MET

  • Next message: Lennart Hansson: "Corpora: Support tools for simplified languages"

    *******************************************************************
    * *
    * First EAGLES/ISLE Workshop on *
    * Meta-Descriptions and Annotation Schemas for *
    * Multimodal/Multimedia Language Resources *
    * *
    * *
    * LREC 2000 Pre-Conference Workshop *
    * Athens, Greece *
    * *
    * 29 or 30 May 2000 *
    * *
    * 1st Announcement *
    * and *
    * Call for Papers *
    * *
    * *
    *******************************************************************

    1. Workshop Outline
    ===================
    Currently, we can identify a number of trends in the community dealing
    with multimodal/multimedia language resources:

     - The number of resources is increasing rapidly.
     - Due to multimedia extensions and rich annotations the structural
       complexity of the resources is entering new dimensions.
     - The quantity of data to be handled is increasing enormously due to
       multimedia extensions, demanding new solutions.
     - The development of technology makes us assume that more and more of
       these resources will be available on the Internet.

    The joint EC/NSF funded EAGLES/ISLE [1] initiative aims to create
    standards and guidelines that can be applied to natural interactivity
    and multimodal language reources (e.g. speech, gesture, facial
    expressions, manual languages) that support the creation, use, re-use
    of and access to such resources. As part of this initiative, the
    workshop will address current trends and discuss structures which
    could simplify and assist the creation and use of annotated
    multimodal/multimedia resources, the process of finding suitable
    resources, and accessing them, for instance, via the Web. The workshop
    will address two related areas: annotation schemas and
    meta-descriptions for multimodal/multimedia language resources.

    Meta-Descriptions for Multimodal/Multimedia Language Resources (MMLR)
    ---------------------------------------------------------------------
    Similar to other communities it is time to bring the widespread users
    of multimedia language resources together and start a discussion about
    meta schemas describing these resources. The goal is to have the
    available multimedia language resources associated with linked
    meta-descriptions which form a browsable and searchable universe open
    to the Internet. A known portal, standardised meta-descriptions and
    suitable tools will help users to more easily find suitable resources
    for the task in mind. This interest unifies people from science,
    industry, and also general users who have to use annotated multimedia
    resources for their scientific analysis, training of commercial
    components and many more.

    Part of the proposed workshop will be dedicated to discussing the need
    for such a universe of linked meta-descriptions, the scope of the
    community, and existing work in this area. Also the nature of the
    meta-descriptions must be extensively discussed with an emphasis on
    questions such as: (1) Which are the elements which describe the
    various language resources? (2) Is a more minimal schema preferable or
    a more elaborate one? (3) How can we achieve flexibility within the
    standard meta-description? (4) How can we automatically derive
    meta-descriptions to make it a feasible task?

    The workshop will also discuss whether benefits can be taken from
    existing standards such as Dublin-Core from the community of digital
    libraries, whether initiatives in the telecommunication and
    broadcasting community are of relevance for our goals, and the impacts
    of the W3C initiative toward a unifying framework called Resource
    Description Framework for all these initiatives.

    Annotation Schemas for MMLR
    ---------------------------
    A second part of the workshop will be dedicated to discussing
    annotation schemas for multimodal/multimedia language resources. Until
    now the community has experience with text-only corpora based mostly
    on orthographical transcriptions (with all their limitations) and with
    corpora covering speech data often associated with one layer of
    orthographic transcriptions and specifically tailored to the needs of
    Automatic Speech Recognition systems. With the increasing power of
    computer technology we see that people are starting to build corpora
    based on several video and sound tracks with rich annotations covering
    easily more than 50 layers. These annotations have complex time
    relationships and various dependencies between and within layers. It
    seems to be clear, therefore, that a large number of such complex
    structured corpora will emerge and the community needs guidelines to
    restrict the heterogeneity of such corpora.

    At the Granada LREC conference we have heard about initial projects
    having implemented "Abstract Data Models" for such multimedia corpora
    [2]. In the meantime a broad discussion about the underlying universal
    structure for such annotations has also been initiated [3]. A number
    of projects in the US and Europe were and are funded to develop
    annotation and exploitation tools to cope with such complex multimedia
    databases. To guarantee a high amount of interoperability and unified
    access to the resources it is time to have a separate workshop
    dedicated to the nature of annotation schemas. Only good agreement in
    this respect will limit the number of access tools needed to exploit
    such databases.

    The emergence of multimedia on computers has changed traditional
    views, since direct media access allows us to refer to media time
    which will never change instead of referring only to transcriptions
    which can be modified and often are not adequate for coding complex
    time relationships. However, the workshop will not only address
    theoretical matters such as the underlying common structure and
    abstract data models, but also raise questions of suitable
    representation formats important for implementation. Formats suitable
    for open exchange and long-term archiving will not be the optimal
    choice for all types of program access and vice versa. We expect that
    modern tools have to rely on several co-existing representation
    formats. We also have to deal with the question of how we can
    integrate existing textually based corpora or corpora which are
    stepwise extended with media data afterwards.

    2. Call for Papers
    ==================
    The workshop will have two subsequent sessions: One will focus on
    Internet-accessible Meta-Descriptions of MMLR. The other will be
    dedicated to Annotation Schemas for MMLR. This workshop is seen as a
    first one in a series which will help understand the complexity of the
    problems and the various approaches found until now. Each session will
    be started by an invited talk to introduce the problem and define the
    scope and be finished by a summary from the organizers. The workshop
    will focus on oral contributions and give enough space for broad
    discussions. Papers are invited which can contribute to these two
    topics.

    Format of Submission
    --------------------
    Submissions should consist of an extended abstract of about one page
    (DIN A4) and a separate title page providing the following
    information: Official title of the paper; names and affiliations of
    the authors; full address of the first author including phone, fax,
    email, URL; required facilities. Only electronic submissions in ASCII,
    Word, or HTML format will be accepted. The submissions should be sent
    to: ISLE-2000@mpi.nl. The reception of the submissions will be
    notified within 3 days. If you did not get a notification, email could
    have been erroneous.

    Proceedings
    -----------
    The workshop organizers will produce proceedings. Therefore,
    print-ready versions of the papers have to be submitted as WORD, PDF
    or PS files. They should not exceed 5 pages (DIN A4).These final
    versions have to be submitted electronically to the same email
    address: ISLE-2000@mpi.nl.

    Important Dates
    ---------------
    Deadline for submissions of papers: March 17th
    Notification of acceptance: April 3rd
    Final versions of papers for proceedings: May 12th
    Workshop: May 29th afternoon and
                                                    30th morning

    3. Organizational Issues
    ========================
    Organizers of the workshop
    --------------------------
    P. Wittenburg, Technical Department, Max-Planck-Institute for
            Psycholinguistics, Nijmegen
    D. Roy, Natural Interactive Systems Laboratory, Faculty of Science and
            Engineering, University of Southern Denmark Odense
    H. Cunningham, Department of Computer Science, University Sheffield

    Questions
    ---------
    For all questions with respect to the workshop focus, please, use the
    email address: ISLE-2000@mpi.nl
    For all questions with respect to organisational issues, accommodation
    etc, please, contact the LREC secretariate: LREC2000@ilsp.gr

    Information
    -----------
    Information about the workshop such as call, schedule, and program can
    be found on the web-page: http://www.mpi.nl/world/ISLE
    Information about the LREC conference can be found on the web-page:
    http://www.icp.grenet.fr/ELRA/lrec2000.html

    Registration
    ------------
    The registration fee for the workshop is:
            - 120 EURO for those not attending LREC
            - 80 EURO for those attending LREC
    Registration and payment is explained on the LREC web-page.

    Included in the registration fee are the proceedings and coffee at the
    breaks.

    Program Committee

    N.O. Bernsen (U Odense)
    S. Bird (U Penn)
    P. Bonhomme (LORIA Nancy)
    D. Broeder (MPI Nijmegen)
    H. Brugman (MPI Nijmegen)
    L. Burnard (U Oxford)
    N. Calzolari (ILC Pisa)
    K. Choukri (ELRA Paris)
    B. Comrie (MPI Leipzig)
    H. Cunningham (U Sheffield)
    U. Heid (U Stuttgart)
    N. Ide (Vassar College)
    T. McEnery (U Lancaster)
    B. MacWhinney (CMU Pitsburgh)
    L. Noldus (Noldus Wageningen)
    S. Piperides (ILSP Athens)
    W. Peters (U Sheffield)
    L. Romary (LORIA Nancy)
    A. Russel (MPI Nijmegen)
    D. Roy (U Odense)
    D. Slobin (U Berkeley)
    S. Steininger (U München)
    S. Stromqvist (U Lund)
    H. Thompson (HCRC Edinburgh)
    Y. Wilks (U Sheffield)
    P. Wittenburg (MPI Nijmegen)
    A. Zampolli (ILC Pisa

    [1] International Standards in Language Engineering project funded by
            EC and NSF
    [2] see http://www.dcs.shef.ac.uk/~hamish/dalr/
    [3] see http://www.ldc.upenn.edu/annotation/ and
            http://www.ltg.ed.ac.uk



    This archive was generated by hypermail 2b29 : Wed Jan 19 2000 - 19:04:36 MET