Re: [Corpora-List] Is the TEI a waste of time?

From: Marco Baroni (baroni@sslmit.unibo.it)
Date: Fri Jun 27 2003 - 14:47:23 MET DST

  • Next message: Elisabeth Burr: "Re: [Corpora-List] Is the TEI a waste of time?"

    > TEI's role, in a world with XML in
    > it, is much harder to delineate

    But nowadays TEI is a form of XML, right? The way I see it, it's an XML
    language to represent linguistic data, like MathML is an XML language
    to represent mathematical formulas. So, ideally, if you have
    TEI-encoded data, you should be able to use any general purpose XML
    tool on them... right?

    Imho, TEI's usefulness (as with any standard) depends on how successful
    it is, on how many people use it.

    If everybody used TEI, then we would not have to spend time worrying
    about the format of our input and output data, data exchange would be a
    trivial issue, and one could write all sort of TEI-aware tools knowing
    that they will be useful to many people. In this TEI-conformant world,
    the time you spend TEI-encoding the data would definitely be
    well-spent, since you would save a lot of time later when dealing with
    other people's data, and you would get access to all sort of useful
    tools that can immediately understand your data. (And TEI seems to be
    flexible enough that a minimal TEILite-encoding does not look like sooo
    much work...)

    Obviously, this is not the current situation, and in the real world the
    presence of TEI-encoding can be a (minor) hassle, since many tools you
    may want to use (pos taggers, morphological analyzers, machine learning
    packages, databases, command-line programs, your own scripts) are not
    TEI-compatible, and TEI is not the easiest format to deal with (as
    compared to, eg, tab-delimited text...)

    I suppose that the best way for people in favor of TEI to convince
    others to adopt the standard would be to provide all sorts of cool
    TEI-conformant tools: programs helping (manual and automated)
    TEI-encoding, programs that perform all sorts of linguistic and
    statistical analyses of TEI-encoded data, indexers and fast searching
    engines for TEI-encoded corpora, TEI-db's, input/output conversion
    tools...

    Sara/Xara seems to be an excellent example of this sort of tool, but,
    as far as I know, it only runs under Windows and it is more of a
    self-contained program that something one could use in combination with
    other tools...

    Regards,

    Marco

    ---
    Marco Baroni
    SSLMIT, University of Bologna
    http://sslmit.unibo.it/~baroni
    



    This archive was generated by hypermail 2b29 : Fri Jun 27 2003 - 14:47:52 MET DST