Corpora: DTDnorm?

Tomaz Erjavec (Tomaz.Erjavec@ijs.si)
Tue, 10 Feb 1998 18:14:24 +0100

Lou Burnard writes:
> You cannot of course do this with TEILite. But TEILite is not meant to
> replace TEI -- it's just one way of simplifying the view you take of
> the TEI dtd. You'd do better to construct your own view -- including the
> participant description, and possibly omitting some of the bits of TEIlite
> which you're not actually using.

We've started work on a corpus of Slovene language, and exactly this
problem cropped up: we started out with TEIlite and found out that it
doesn't quite fit the bill; you can go some way with the -//TEI
U5-1995//DTD TEI Lite 1.0 Extensions//EN but, if nothing else, you
can't exclude TEIlite elements or entities.

Our modifications vs. TEIlite are that we'd want the corpus header and
we'd keep tighter control in (text) headers and on entities for our
non-ASCII characters.

I'm trying now - using the 'TEIlite in TEI' files as templates - to
constrict the TEI to get the DTD needed. The problem is that I'd need
to produce the 'final' DTD in a single file; the SW of the people
doing the programming can't handle the DTD in multiple files, with
parameter entities and all that. Is there a free Unix DTD normalizer
available out there somewhere?

(I did find DTDnorm, but that is DOS where I'm severely disadvantaged;
as there's lots DTD tools out there, which all have to parse it, I
hope that there's something else available.)

Thanks,
Tomaz

-------------
Tomaz Erjavec | Dept. for Intelligent Systems E-8
email: tomaz.erjavec@ijs.si | Jozef Stefan Institute
www: http://nl.ijs.si/tomaz | Jamova 39
tel: (+386 61) 177-3-507 | SI-1000 Ljubljana
fax: (+386 61) 219-385 | Slovenia