Re: [Corpora-List] Is the TEI a waste of time?

From: Lars Aronsson (lars@aronsson.se)
Date: Fri Jun 27 2003 - 19:56:20 MET DST

  • Next message: diabruck@coli.uni-sb.de: "[Corpora-List] DiaBruck 2003, call for participation and for demos and project notes (Submission deadline: July 11 2003)"

    David Graff wrote:
    > I agree wholeheartedly with these points. However, it is possible to
    > devote all due attention and care to the "niceties, whys and wherefores"
    > without strict adherence to the full details of TEI specifications.

    I'm new on this list and I'm not a linguist. But I do run a website
    of Scandinavian/Nordic literature ("Project Runeberg", runeberg.org)
    since 1992 and I was among those who read the TEI P3 guidelines when
    they first appeared. My own conclusion was that TEI was too much and
    I didn't see any immediate need for it in my application. None of my
    users have asked me to add TEI markup. Some have asked me why I don't
    use TEI, but that is a different thing. I return by asking for a
    reason, and I never hear any. If *you* can explain why TEI markup of
    Project Runeberg's texts would make them *any more useful to you*, I
    might well give it a shot. This is an invitation. The explanation
    should detail what parts of TEI would be useful to you.

    Since 1998, Project Runeberg has added facsimile images to the old
    books and journals that we digitize. Since this is a zero-budget
    hobbyist project and the highest cost of digitization is proofreading,
    we publish facsimile images together with raw OCR text. This spring
    we have got a system working where any reader can correct errors and
    proofread these texts over the web, directly from the browser (earlier
    we used proofreading by e-mail). Usage is catching on and we are
    now producing high quality texts at a good speed. Also this spring,
    we fininished scanning two editions (20 + 38 volumes) of the Swedish
    encyclopedia "Nordisk familjebok" (http://runeberg.org/nf/), which now
    constitutes about half of our entire facsimile collection (45,000 of
    100,000 pages). The OCR text from the encyclopedia is 245 megabytes.
    A typical page would be http://runeberg.org/nfbe/0408.html

    I'd be interested in ways to make this collection more useful.

    -- 
      Lars Aronsson (lars@aronsson.se)
      Project Runeberg - free Nordic literature - http://runeberg.org/
    



    This archive was generated by hypermail 2b29 : Fri Jun 27 2003 - 19:59:41 MET DST