Announcement: EMail Part-of-Speech Tagger

Oliver Jakobs (oliver@clg.bham.ac.uk)
Fri, 3 Feb 1995 10:48:58 +0000 (GMT)

Corpus Linguistics Group, University of Birmingham
==================================================

Announcement of the Experimental Part-of-Speech Tagging Service
---------------------------------------------------------------

We are pleased to announce an experimental E-Mail Tagging Service for
English texts. Further Information on this service can be found below,
PLEASE READ IT CAREFULLY, ESPECIALLY THE LAST TWO PARAGRAPHS!

What is Part-of-Speech Tagging?
===============================

Part-of-Speech Tagging is a linguistic procedure which attaches
word-class information to the words in a text. This information is
useful for further linguistic study, either for analysing the syntactic
structure of the sentences of the text or for statistical work such as
counting the distribution of the different word classes in text
corpora. A list of tags is available for reference (see below).

How does it work?
=================

The tagging program which is in use at the Corpus Linguistics Group
here at Birmingham works stochastically. That means, it calculates the
most probable word class in case of ambiguities (if a word can belong
to several word classes, like light, which can either be a noun, a verb
or an adjective, depending on its actual use). Both the probability of
the word belonging to a certain word-class and the probability of the
word-class occurring at the specified position in the text are taken
into account. Since it's probabilistic, there is no 100% correctness,
but it is offered as a useful tool rather than a theory of language.
We have not formally measured the accuracy of the tagger, but believe it
to be quite high.

What do you have to do?
=======================

The program is now publically accessible by means of an experimental
E-Mail Tagging Service. In order to get an English text tagged, just
send this text to

tagger@clg.bham.ac.uk.

The text should not contain any formatting information, as this might
lead to undesirable results. The output of the tagging process is sent
back to you by email, together with an ID code for later reference.
If you want to get a long text tagged, please split it up into several
parts of about 50 KB each, since some mailers cannot cope with huge
mails.

If you want to receive the list of tag labels used, just send an empty
mail with the subject line ``taglist'' to the above address.

How do we profit from this?
===========================

Since we are using CPU time of one of our workstations, we would
naturally like to profit from this enterprise as well. We are always
trying to increase our own collection of English text data and so we
would like to keep a copy of each text that has been sent to us. So
don't send us anything you don't want us to use (eg. if you are not
allowed to pass a text to other people).

Disclaimer
==========

We do not take any responsibility for damage caused by using the
Experimental E-Mail Tagging Service. We also do not guarantee that the
results obtained will be correct, even though we will do our best to
achieve this. If you notice any errors, we would be glad if you could
send a corrected version of your text (ie. with wrong tags replaced by
correct ones) to tag-admin@clg.bham.ac.uk. We would then be able to
further enhance the quality of the tagger's output.

-------------------------------------------------------------------------
Corpus Linguistics Group, School of English, The University of Birmingham
WWW-access via http://clg1.bham.ac.uk/
Email: tag-admin@clg.bham.ac.uk