TEI for POS-tagging? Query language?

Torbjoern Lager (lager@ling.gu.se)
Wed, 5 Apr 95 10:00:41 +0200

I understand that the Text Encoding Initiative (TEI) has chosen SGML
for linguistically motivated coding of corpus texts. I need to write a
short piece describing this approach.

Now I'm not sure I have understood exactly how to use SGML for the
purpose of (say) part-of-speech coding. I imagine something along the
following line (I'm not concerned about the actual tagset used, more
about the general idea):

<pn>John</pn><v>loves</v><pn>Mary</pn><full_stop>.</full_stop>

Is this correct? If this isn't a good example of the use of the TEI
approach, would someone please provide me with a better example.

Also, how would an a good example of the coding of phrase structure
look like? For the sentence "John loves Mary", say?

Another question comes to my mind: Does the TEI consider it their task
to design and specify a _query_language_ to match SGML-coded texts, or
is that a problem left open to the implementors of tools? I mean, how
would one, for example, specify a search for verbs immediately followed
by nouns? Or a concordance of adjectives _not_ followed by nouns? As I
understand it, tools that do useful things with TEI/SGML coded text are
not yet available. Wouldn't a careful, formal specification of a query
language speed up the process of developing such tools?

Thanks in advance.

Best regards,
Torbjoern Lager

---------------------------------**-------------------------------------*------

Torbjoern Lager E-mail: lager@ling.gu.se
Department of Linguistics Phone: +46 31 7731175
University of Gothenburg Fax: +46 31 7734853
Renstroemsparken
412 98 Gothenburg
Sweden

**-*-----*-*------------------*------------------------------------------------