Corpora: Large Corpora & Annotation Standards at ANLP/NAACL2000

From: Priscilla Rasmussen (rasmusse@cs.rutgers.edu)
Date: Tue Apr 11 2000 - 23:54:28 MET DST

  • Next message: Priscilla Rasmussen: "Corpora: ACL-2000 Student Workshop Call for Papers"

                         Large Corpora and Annotation Standards

                    http://www.cs.vassar.edu/~ide/ANLP-NAACL2000.html

                         Held in conjunction with ANLP/NAACL'00
                                 Seattle, Washington
                                  4 May 2000 1-6pm

                This meeting is intended to bring together researchers and
                developers from a variety of domains in text, speech,
                video, etc., to look broadly at the technical issues that
                bear on the development of software systems and standards
                for the annotation and exploitation of linguistic
                resources. The goal is to lay the groundwork for the
                definition of a data and system architecture to support
                corpus annotation and exploitation that can be widely
                adopted within the community.

                Among the issues to be addressed are:

                    - layered data architectures
                    - system architectures for distributed databases
                    - support for plurality of annotation schemes
                    - impact and use of XML/XSL
                    - support for multimedia, including speech and video
                    - tools for creation, annotation, query and access
                    - of corpora
                    - mechanisms for linkage of annotation and primary
                        data
                    - applicability of semi-structured data models,
                    - search and query systems, etc.
                    - evaluation/validation of systems and annotations

                The motivation for this meeting is the American National
                Corpus (ANC) effort, which should begin corpus creation
                within the year. We anticipate that the ANC will provide a
                significant resource for natural language processing, and
                we therefore seek to identify state-of-the-art methods for
                its creation, annotation, and exploitation. Also, as a
                national and freely available resource, the data and
                system architecture of the ANC is likely to become a de
                facto standard. We therefore hope to draw together leading
                researchers and developers to establish a basis for the
                design of a system to support the creation and use of the
                ANC.

                                           Provisional Program

                       Overview of the American National Corpus Effort
                          Nancy Ide and Catherine Macleod

                       Searching Linguistically Annotated Corpora
                          Chris Brew

                       Considerations for Large Corpus Annotation:
                       Intercoder Reliability
                          Rebecca Bruce and Janyce Wiebe

                       The XML Framework and Its Implications for Large
                       Corpus Access
                          Nancy Ide

                       The ATLAS System
                          John Henderson

                       Annotation Standards and Their Impact on Large
                       Corpus Development
                          Nicoletta Calzolari

                       A Framework for Multi-level Linguistic Annotation
                          Patrice Lopez and Laurent Romary

                       Discussion : Requirements for the ANC

              A related workshop will be held at the LREC conference on
              May 29-30, 2000. See
              http://www.cs.vassar.edu/~ide/anc/lrec.html.

              Organizer:

              Nancy Ide
              Professor and Chair
              Department of Computer Science
              Vassar College
              Poughkeepsie, NY 12604-0520 USA
              Tel: +1 914 437-5988 Fax: +1 914 437-7498
              ide@cs.vassar.edu



    This archive was generated by hypermail 2b29 : Wed Apr 12 2000 - 01:14:11 MET DST