(no subject)

corpora-list@cogsci.ed.ac.uk
Tue, 2 Sep 1997 10:11:45 +0100 (BST)

The HCRC Language Technology Group is pleased to announce a new
release of LT XML, the first high-performance publicly available XML
toolset written in C.

For further information and access to the software distribution, see

http://www.ltg.ed.ac.uk/software/xml/

The LT XML tool-kit includes stand-alone tools for a wide range of
processing of well-formed XML documents, including searching and
extracting, down-translation (e.g. report generation, formatting),
tokenising and sorting. If you've been waiting for high throughput
XML tools with simple command-line interfaces to explore the potential
of XML, LT XML is just what you need to get started. Basic throughput
is under 3 seconds/megabyte on a Pentium 133, fast enough to make
processing substantial XML datasets feasible.

LT XML is an integrated set of XML tools and a developers' tool-kit,
including a C-based API. As well as sources, this release includes
executable images for a range of platforms, including Windows 95 and
Windows NT, FreeBSD, Linux and Solaris. A preliminary partial
Macintosh version is also available. This release is restricted to
8-bit character input/output, and does NOT do validation, although it
does process and make use of DTDs in documents which include them.

Sequences of LT XML tool applications can be pipelined together to achieve
complex results. Tools included in this release include:

* sggrep -- extract sub-parts of XML documents, using patterns over
element structure and text content;

* textonly -- extract text content only;

* sgsort -- reorder sub-elements within specified elements

* sgmltrans -- pattern+action downtranslation tool

* sgrpg -- Structure-based transformation tool

* simple, simpleq -- event- and fragment-based examples of API use

For special purposes beyond what the pre-constructed tools can
achieve, extending their functionality and/or creating new tools is
easy using the LT XML API, which provides both event-oriented and
tree-fragment oriented access to the input document stream. Minimal
applications require less than one-half page of C code to express.

LT XML is available to anyone free of charge for non-commercial purposes.