[Corpora-List] Experience with linguistic annotation in separate files?

From: Scott James Cederberg (cederber@csli.Stanford.EDU)
Date: Fri Apr 18 2003 - 05:19:57 MET DST

  • Next message: Harold Somers: "[Corpora-List] Survey of online MT"

    Hello,

            Does anyone out there have some experience working with
            corpora that have linguistic annotation (e.g. for part of
            speech, syntax, multiword expressions, or word senses) kept in
            files separate from the text itself?

            This is the system recommended by the CES and XCES corpus
            encoding standards, and the TEI guidelines also provide a
            mechanism for putting tags in one document that indicate links
            to another document.

            I'm trying to get my mind around how best to enable software
            to access corpus annotation in such a format. Ideally such
            access could be provided using standard XML formats and tools,
            like XPath and XSLT.

            Any suggestions on how best to do this, pointers to software
            or APIs that work with modular annotation, etc. would be
            invaluable.

            Thanks for your help.

                                                            Scott

    -- 
    Scott Cederberg
    Researcher
    

    Infomap Project Computational Semantics Lab Center for the Study of Language and Information (CSLI) Stanford University

    http://infomap.stanford.edu/



    This archive was generated by hypermail 2b29 : Fri Apr 18 2003 - 05:21:24 MET DST