RE: [Corpora-List] Is the TEI a waste of time? / Lack of TEI software tools

From: Burnard Towers (lou.burnard@computing-services.oxford.ac.uk)
Date: Sun Jul 06 2003 - 00:00:13 MET DST

  • Next message: Yuri Tambovtsev: "[Corpora-List] Fulani and Maninka corpora"

    Is there any software anywhere which *doesn't* operate on some sort of
    internal non-Xml format?
    Is there / has there ever been any software anywhere which *didnt* convert
    an external form into something more compact and efficient before processing
    it?

     XML is a serialization of a tree structure. It's hard to imagine software
    which wouldn't store and process such structures non-serially! XML uses UTF8
    or UTF16 to encode character data. It's highly probable that any efficient
    software would pack such character data into shorter representations.

    So what?

    > -----Original Message-----
    > From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no]On
    > Behalf Of Oliver Mason
    > Sent: 04 July 2003 12:21
    > To: corpora@uib.no
    > Subject: [Corpora-List] Is the TEI a waste of time? / Lack of TEI
    > software tools
    >
    >
    > I would guess that most software uses internal, non-XML formats, as they
    > are generally easier to process from a programmer's point of view and
    > more efficient computationally; and if you've got large corpora time and
    > space efficiency are quite important. My own approach has always been
    > that TEI-style markup is fine for exchanging data, but when it is being
    > indexed and prepared for processing it'll be converted into some
    > tool-specific form.
    >
    > So, yes, the TEI is important, as it means that there is a standard for
    > the data that's coming in, even though corpus processing software will
    > typically not operate directly on that. Corpus tools should accept TEI
    > marked-up data, but might convert it into their own format.
    >
    > Oliver
    >
    > PS Of course I'm not denying that it is possible to write concordancing
    > software that works with XML data.
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2b29 : Sun Jul 06 2003 - 00:44:17 MET DST