Re: [Corpora-List] annotation of aligned texts

From: Nancy Ide (ide@cs.vassar.edu)
Date: Mon Jul 22 2002 - 01:41:41 MET DST

  • Next message: Diego Molla: "[Corpora-List] Last CFP -- ANLP2002"

    On Thursday, July 18, 2002, at 09:59 AM, pamela forner wrote:
    > We are working with parallel texts aligned at word level and we are now
    > facing the problem of encoding the alignment information. We’d like the
    > annotation to be as conformant as possible to XCES standards for
    > parallel texts alignment, but we only found examples at sentence level.
    > Could anybody provide further information about XCES standards or is
    > anybody aware of other accepted conventions for annotation of texts
    > aligned at word level?

    it is true that there are examples only for the sentence level in the
    current (CES) documentation. However, we now have on-line (although as
    yet unannounced) a suite of XCES schemas to replace the DTDs. Using
    these, you can link to anything you want to--whether it is tagged (for
    words, this would be with <w> tags as per the XCES doc conventions) or
    not (in which case you use offset information in the xlink). Please have
    a look at the new XCES schemas at http://www.xml-ces.org.

    The schemas have not yet been made fully public for two reasons: (1) the
    new schemas for spoken data are not as yet finalized; and (2) there are
    some problems with various XML schema parsers, which are unfortunately
    inconsistent in their ability to handle data encoded according to the W3
    specs. This means that our use of various features is not always
    accepted by a given parser, and we want to be able to make concrete
    receommendations about parsers etc. before going public. However, the
    XCES schemas as they exist now on the web site are reasonably robust,
    and there should be no problem with "upward compatibility" once we
    announce the official versions.

    Please contact me or suderman@cs.vassar.edu (the schema developer) if
    you have any problems with or questions about the schemas--we are
    anxious to help out anyone who is using them!

    Nancy Ide

    =======================================================

    Nancy Ide

    Professor and Chair
    Department of Computer Science, Vassar College
    Poughkeepsie, NY 12604-0520 USA
    Tel: +1 845 437-5988 Fax: +1 845 437-7498
    ide@cs.vassar.edu

    Chercheur Associe
    Equipe Langue et Dialogue, LORIA/CNRS
    Campus Scientifique - BP 239
    54506 Vandoeuvre-les-Nancy FRANCE
    Tel: +33 (0)3 83 59 20 47 Fax: +33 (0)3 83 41 30 79
    ide@loria.fr

    =======================================================



    This archive was generated by hypermail 2b29 : Mon Jul 22 2002 - 08:59:15 MET DST