[Corpora-List] Is the TEI a waste of time? / Lack of TEI software tools

From: Oliver Mason (O.Mason@bham.ac.uk)
Date: Fri Jul 04 2003 - 13:20:58 MET DST

  • Next message: Yuri Tambovtsev: "[Corpora-List] our corpora on world languages"

    I would guess that most software uses internal, non-XML formats, as they
    are generally easier to process from a programmer's point of view and
    more efficient computationally; and if you've got large corpora time and
    space efficiency are quite important. My own approach has always been
    that TEI-style markup is fine for exchanging data, but when it is being
    indexed and prepared for processing it'll be converted into some
    tool-specific form.

    So, yes, the TEI is important, as it means that there is a standard for
    the data that's coming in, even though corpus processing software will
    typically not operate directly on that. Corpus tools should accept TEI
    marked-up data, but might convert it into their own format.

    Oliver

    PS Of course I'm not denying that it is possible to write concordancing
       software that works with XML data.



    This archive was generated by hypermail 2b29 : Fri Jul 04 2003 - 13:10:31 MET DST