Re: [Corpora-List] Is the TEI a waste of time?

From: geoffrey.williams (geoffrey.williams@wanadoo.fr)
Date: Fri Jun 27 2003 - 11:04:21 MET DST

  • Next message: Stephen Miller: "RE: [Corpora-List] Is the TEI a waste of time?"

    Whilst trying to observe and not react for a while so as to cut my email writing time, I cannot but reply to this string with an unequivocal, no, TEI is definitely not a waste of time, but a cornerstone of corpus linguistics.

    It is obvious that for the small corpus designer, one million or less tokens, markup is a considerable investment in time. However, if one holds, as I do, that a corpus is not a simple mass of data, but a carefully compiled selection of texts, then we need a means to treat them as texts, to store both their general features and their particularities. This the TEI does.

    In my own work in the field of English for Academic Purposes, I tend not to use the corpus header but a standard individual header so as to stock all the bibliographic information and socilinguistic parameters associated with the text. The depth of markup depends on my needs, and time, for an individual text. In this way I can move with ease from a fully annotated single text to a more lightly marked up corpus. This is possible because of the encoding possibilities of the TEI.

    Education is very much part of the answer. Easy access to vast amounts of downloadable data has meant that a number of "corpus linguists" neither know nor care about the niceties of corpus creation, and the whys and wherefores of selecting and marking up data. Ease of access has become the main criterion, potentially to the detriment of the discipline itself. Easy solutions do not necessarily answer the most pertinent questions.

    It is true that all this takes time, but if we throw out all that is time-consuming drudgery from corpus linguistics, we may find that we have thrown out our text baby with the corpus bathwater and are only left with ready-made corpora for ready-made answers.

    Back to some time consuming markup.

    Geoffrey

    ***********************************************************

    Dr. Geoffrey C. Williams,
    Département Langues Etrangères Appliquées
    U.F.R. Lettres et Sciences Humaines
    4, rue Jean Zay
    B.P. 92116
    56321 LORIENT Cedex
    FRANCE

    tél : 33 (0) 2 97 87 29 68
    fax : 33 (0) 2 97 87 29 70

    email : Geoffrey.Williams@univ-ubs.fr

    http://www.univ-ubs.fr/crellic

    ***************************************************

    ----- Original Message -----
    From: "Mcenery, Tony" <eiaamme@exchange.lancs.ac.uk>
    To: "Simpson, Rita" <ritacsim@umich.edu>; "Christopher Brewster" <C.Brewster@dcs.shef.ac.uk>; <corpora@uib.no>
    Sent: Thursday, June 26, 2003 4:54 PM
    Subject: RE: [Corpora-List] Is the TEI a waste of time?

    Dear Rita,

    Yes, I have some sympathy with the point you make. The thing that has attracted me to the TEI in the past, though, is once the effort is made to get to grips with it (and it is daunting) there is usually a well thought through solution contained in it for almost any problem situation you come across in encoding a corpus! With that said, it is a clear theme of the posts so far that there is, at the very least, an advocacy issue related to the TEI in corpus linguistics, which is interesting.

    Best,

    T

    -----Original Message-----
    From: Simpson, Rita [mailto:ritacsim@umich.edu]
    Sent: 26 June 2003 14:09
    To: Christopher Brewster; corpora@uib.no
    Subject: RE: [Corpora-List] Is the TEI a waste of time?

    Interesting question...

    > There are two issues here:
    > 1. Ignorance and confusion. Most people have only a vague
    > idea what TEI
    > is or does or what it is good for. There would need to be a effort to
    > (re-) educate the potential users of TEI. Does TEI do something
    > different from XML? Absurd question I know but that is the kind of
    > confusion which I suspect exists.
    >
    > 2. Complexity. When it was introduced many people reacted
    > against it as
    > too complex. Now they have all adopted xml, rdf etc. which
    > are much more
    > complicated to use. So potential users' perception would now
    > be ripe for
    > a re-presentation of TEI.
    >

    Related to both of these issues is that of the documentation available
    to educate people & help potential users understand what TEI is, does,
    & is good for. A research assistant & I have recently been poring over a

    couple chapters of the TEI guidelines, looking for guidelines & relevant
    examples to add some markup to our already (mostly) TEI-conformant
    corpus
    markup scheme. Although the documentation is extensive, it is inadequate

    in many ways, missing examples, not very good at giving a larger picture
    to people who aren't sure if they need/want the TEI at all or who just
    need
    some pointers to a few relevant sections. If the only people who can
    read the documentation and make use of it are information/library
    science
    people who are specifically trained in that area, then it's no wonder
    linguists & others who are in the business of building corpora are not
    using it or promoting it.

    Rita Simpson

    ------------------------------------------------------------------------
    -
    Project Director, Michigan Corpus of Academic Spoken English (MICASE)
    English Language Institute
    University of Michigan
    ------------------------------------------------------------------------
    -



    This archive was generated by hypermail 2b29 : Fri Jun 27 2003 - 11:01:03 MET DST