Re: [Corpora-List] Is the TEI a waste of time?

From: Elisabeth Burr (elisabeth.burr@uni-duisburg.de)
Date: Fri Jun 27 2003 - 15:30:26 MET DST

  • Next message: Lars Aronsson: "Re: [Corpora-List] Is the TEI a waste of time?"

    I agree with every word, Geoffrey Williams says, and I see the same dangers
    for corpus
    linguistics.

    Best
    Elisabeth Burr

    At 11:04 27.06.2003 +0200, you wrote:
    >Whilst trying to observe and not react for a while so as to cut my email
    >writing time, I cannot but reply to this string with an unequivocal, no,
    >TEI is definitely not a waste of time, but a cornerstone of corpus linguistics.
    >
    >It is obvious that for the small corpus designer, one million or less
    >tokens, markup is a considerable investment in time. However, if one
    >holds, as I do, that a corpus is not a simple mass of data, but a
    >carefully compiled selection of texts, then we need a means to treat them
    >as texts, to store both their general features and their particularities.
    >This the TEI does.
    >
    >In my own work in the field of English for Academic Purposes, I tend not
    >to use the corpus header but a standard individual header so as to stock
    >all the bibliographic information and socilinguistic parameters associated
    >with the text. The depth of markup depends on my needs, and time, for an
    >individual text. In this way I can move with ease from a fully annotated
    >single text to a more lightly marked up corpus. This is possible because
    >of the encoding possibilities of the TEI.
    >
    >Education is very much part of the answer. Easy access to vast amounts of
    >downloadable data has meant that a number of "corpus linguists" neither
    >know nor care about the niceties of corpus creation, and the whys and
    >wherefores of selecting and marking up data. Ease of access has become the
    >main criterion, potentially to the detriment of the discipline itself.
    >Easy solutions do not necessarily answer the most pertinent questions.
    >
    >It is true that all this takes time, but if we throw out all that is
    >time-consuming drudgery from corpus linguistics, we may find that we have
    >thrown out our text baby with the corpus bathwater and are only left with
    >ready-made corpora for ready-made answers.
    >
    >Back to some time consuming markup.
    >
    >Geoffrey
    >
    >***********************************************************
    >
    >Dr. Geoffrey C. Williams,
    >Département Langues Etrangères Appliquées
    >U.F.R. Lettres et Sciences Humaines
    >4, rue Jean Zay
    >B.P. 92116
    >56321 LORIENT Cedex
    >FRANCE
    >
    >tél : 33 (0) 2 97 87 29 68
    >fax : 33 (0) 2 97 87 29 70
    >
    >email : Geoffrey.Williams@univ-ubs.fr
    >
    >http://www.univ-ubs.fr/crellic
    >
    >***************************************************
    >
    >
    >----- Original Message -----
    >From: "Mcenery, Tony" <eiaamme@exchange.lancs.ac.uk>
    >To: "Simpson, Rita" <ritacsim@umich.edu>; "Christopher Brewster"
    ><C.Brewster@dcs.shef.ac.uk>; <corpora@uib.no>
    >Sent: Thursday, June 26, 2003 4:54 PM
    >Subject: RE: [Corpora-List] Is the TEI a waste of time?
    >
    >
    >Dear Rita,
    >
    >Yes, I have some sympathy with the point you make. The thing that has
    >attracted me to the TEI in the past, though, is once the effort is made to
    >get to grips with it (and it is daunting) there is usually a well thought
    >through solution contained in it for almost any problem situation you come
    >across in encoding a corpus! With that said, it is a clear theme of the
    >posts so far that there is, at the very least, an advocacy issue related
    >to the TEI in corpus linguistics, which is interesting.
    >
    >Best,
    >
    >T
    >
    >-----Original Message-----
    >From: Simpson, Rita [mailto:ritacsim@umich.edu]
    >Sent: 26 June 2003 14:09
    >To: Christopher Brewster; corpora@uib.no
    >Subject: RE: [Corpora-List] Is the TEI a waste of time?
    >
    >
    >Interesting question...
    >
    > > There are two issues here:
    > > 1. Ignorance and confusion. Most people have only a vague
    > > idea what TEI
    > > is or does or what it is good for. There would need to be a effort to
    > > (re-) educate the potential users of TEI. Does TEI do something
    > > different from XML? Absurd question I know but that is the kind of
    > > confusion which I suspect exists.
    > >
    > > 2. Complexity. When it was introduced many people reacted
    > > against it as
    > > too complex. Now they have all adopted xml, rdf etc. which
    > > are much more
    > > complicated to use. So potential users' perception would now
    > > be ripe for
    > > a re-presentation of TEI.
    > >
    >
    >Related to both of these issues is that of the documentation available
    >to educate people & help potential users understand what TEI is, does,
    >& is good for. A research assistant & I have recently been poring over a
    >
    >couple chapters of the TEI guidelines, looking for guidelines & relevant
    >examples to add some markup to our already (mostly) TEI-conformant
    >corpus
    >markup scheme. Although the documentation is extensive, it is inadequate
    >
    >in many ways, missing examples, not very good at giving a larger picture
    >to people who aren't sure if they need/want the TEI at all or who just
    >need
    >some pointers to a few relevant sections. If the only people who can
    >read the documentation and make use of it are information/library
    >science
    >people who are specifically trained in that area, then it's no wonder
    >linguists & others who are in the business of building corpora are not
    >using it or promoting it.
    >
    >Rita Simpson
    >
    >------------------------------------------------------------------------
    >-
    >Project Director, Michigan Corpus of Academic Spoken English (MICASE)
    >English Language Institute
    >University of Michigan
    >------------------------------------------------------------------------
    >-

    HD Dr. Elisabeth Burr
    Romanistik
    Institut für Fremdsprachliche Philologien
    Fakultät 2: Geisteswissenschaften
    Universität Duisburg-Essen
    Standort Duisburg
    Geibelstr. 41
    47048 Duisburg

    http://www.uni-duisburg.de/Fak2/FremdPhil/Romanistik/Personal/Burr/



    This archive was generated by hypermail 2b29 : Fri Jun 27 2003 - 15:29:38 MET DST