Corpora: IRCS Workshop on Linguistic Databases

From: Steven Bird (sb@unagi.cis.upenn.edu)
Date: Thu Apr 05 2001 - 17:35:09 MET DST

  • Next message: Priscilla Rasmussen: "Corpora: DAEDLINE EXTENSION: ACL-2001 Workshop on Natural Language Generation"

    Advance Announcement:
    (apologies for duplicates)

                       IRCS WORKSHOP ON LINGUISTIC DATABASES

                            University of Pennsylvania
                                 Philadelphia, USA
                                11-13 December 2001

                   http://www.ldc.upenn.edu/annotation/database/

                   Sponsored by the National Science Foundation
                and the Institute for Research in Cognitive Science

                                   Organized by:
                   Steven Bird, Peter Buneman and Mark Liberman
                  Department of Computer and Information Science,
           Department of Linguistics, and the Linguistic Data Consortium
                            University of Pennsylvania

    Linguistic databases are digital repositories of structured information
    intended to document natural language and natural communicative
    interaction. Over the last decade, linguistic databases have come to stand
    at the center of empirical research in the language sciences, and in the
    development of new human language technologies. Like genomic databases,
    linguistic databases are complex, evolving and richly annotated
    repositories, and pose interesting challenges for efficient representation,
    indexing and query. And like most scientific databases, linguistic
    databases have made little use of standard database technology.

    The goals of the workshop are to take stock of existing research in
    linguistic databases, to identify the key problems, and to explore
    applications of current database research to these problems. More broadly,
    the workshop will help define the research questions of a new "linguistic
    database community" and initiate the ongoing interchange of relevant
    problems and results between this community and the database community at
    large.

    The workshop will address a selection of the following topics:

    MODELS:
    * models for text databases, speech databases, multimodal databases,
      typological databases, geographical databases (language maps),
      and metadata repositories
    * relational, object-oriented and semi-structured models for
      representing linguistic annotations
    * representations for specific linguistic datatypes (e.g. databases of
      aligned parallel text)
    * modelling temporal and (geo)spatial structure
    * critical analysis of existing linguistic databases

    LANGUAGES:
    * query of multilayer annotations
    * linguistic applications/extensions of XML query languages
    * analysis of existing ad hoc query languages
    * queries over temporal and (geo)spatial structure

    OTHER TOPICS:
    * database support (e.g. what standard database technology has proven
      worthwhile for linguistic databases?)
    * appropriate indexing methods for linguistic strings and structures
    * archiving and preservation
    * metadata standards serving as finding aids for linguistic databases
    * data provenance / data lineage
    * annotation servers

    Provisional Timetable

    Call for papers: posted in May
    Extended abstracts: due in August
    Final papers: due in November

    Website and Mailing List

    Subsequent announcements will be posted to this list, and on the workshop
    website: http://www.ldc.upenn.edu/annotation/database/

    Steven Bird, Peter Buneman and Mark Liberman

    --
    Steven Bird     http://www.ldc.upenn.edu/sb/       sb@ldc.upenn.edu
    Peter Buneman   http://www.cis.upenn.edu/~peter/   peter@cis.upenn.edu
    Mark Liberman   http://www.ldc.upenn.edu/myl/      myl@unagi.cis.upenn.edu
    



    This archive was generated by hypermail 2b29 : Thu Apr 05 2001 - 17:31:23 MET DST