English POS-tagging by email

E S Atwell (eric@scs.leeds.ac.uk)
Thu, 23 Jan 1997 11:14:53 GMT

We have set up an experimental email server for annotating English text
with grammatical Part-of-Speech tags. We are aware that several POS-taggers
are already available; ours is different in that (a) you can use it
via email, without having the bother of installing it on your machine;
(b) you can choose your preferred set of POS-tag categories, from 8 standard
sets which have been used in English corpus linguistics research.
The amalgam-tagger is based on the Brill tagger, retrained with 8 POS-tagged
English corpora.

This service is provided under the UK EPSRC-funded project
AMALGAM: Automatic Mapping Among Lexico-Grammatical Annotation Models,
see http://agora.leeds.ac.uk/amalgam/

To use, mail your English text to: amalgam-tagger@scs.leeds.ac.uk
with as SUBJECT one of: Brown, ICE, LLC, LOB, Parts, POW, SEC, UPenn
We advise you not to mail files larger than 50Kb: the tagged text may
cause your mailer problems as it can be more than double the size of your
original message.

For more information, mail amalgam-tagger@scs.leeds.ac.uk, Subject: help
- this helpfile is appended below to save you having to request it...

We are NOT keeping permanent copies of your texts, but we ARE monitoring
who is using the service (email addresses and file sizes).
PLEASE LET ME KNOW IF YOU FIND A GOOD USE FOR THIS SERVICE
- not so I can start charging you, but to help our case for follow-up grants!

Eric Atwell, John Hughes, Clive Souter, Sean Wilcock,
Centre for Computer Analysis of Language And Speech (CCALAS)
Artificial Intelligence Division, School of Computer Studies
The University of Leeds, LEEDS LS2 9JT, Yorkshire, England
TEL:0113-2335761 FAX:0113-2335468 EMAIL:eric@scs.leeds.ac.uk
WWW: http://agora.leeds.ac.uk/scs/public/staff/eric.html
http://agora.leeds.ac.uk/amalgam/

*****************************************************************************

AMALGAM tagger Help file
~~~~~~~~~~~~~~~~~~~~~~~~

Email software written by Sean Wilcock and John Hughes.
Tagging software written by John Hughes.

For tagging requests, please mail amalgam-tagger@scs.leeds.ac.uk
For questions about the email service, please mail sean@scs.leeds.ac.uk

Further information on the AMALGAM tagger can be found on our Web site:

http://agora.leeds.ac.uk/amalgam/

A description of the eight tag-sets can be found at:

http://agora.leeds.ac.uk/amalgam/tagsets/tagmenu.html

You can request eight types of tagging. Please use just the following
abbreviations for the tagging schemes in the subject line of your mail message:

Name: Abbreviation:

1) Brown Corpus Brown
2) International Corpus of English ICE
3) Lundon-Lund Corpus LLC
4) Lancaster-Oslo/Bergen Corpus LOB
5) UNIX parts Parts
6) Polytechnic of Wales Corpus POW
7) Spoken English Corpus SEC
8) University of Pennsylvania Corpus UPenn

Each tagging scheme that you specify will produce its own mailed reply.

By default, the tagger will use our tokeniser for any scheme until the word
'notoken' is encountered in the subject line. For any scheme name after
that the tokeniser will not be used. You can toggle between tokenisation and
non-tokenisation by inserting 'token' and 'notoken' between any group of scheme
names. An example of tokenised output is given later.

The tagger can also be used in `verbose' mode which appends a detailed
description of the sytnactic role of each tag to each line. By default,
the tagger does not use the verbose mode until the word `verbose' is
encountered on the subject line. The tagger will revert to not using
verbose mode if `noverbose' is encounted on the subject line. The use
of `verbose' and `noverbose' can be toggled.

For example,

to: amalgam-tagger@scs.leeds.ac.uk
subject: ICE LOB notoken SEC verbose Parts token noverbose UPenn.

Our tokeniser will be used when tagging ice, lob and upenn, and will not be
used when tagging sec and parts. The verbose mode will be used *only* on the
output of the *Parts* scheme but not for any of the others.

In the body of the message please enclose the text you wish to be tagged in
ASCII format. When your request has been dealt with, the tagged text will
be returned in vertical format.

For example,

to: amalgam-tagger@scs.leeds.ac.uk
subject: verbose LOB
message body: If he's not in action, he's in traction!

gives the output:

if/CS conjunction, subordinating
he/PP3A pronoun, personal, nominative, 3rd person singular
's/BEZ verb "to be", present tense, 3rd person singular
not/XNOT negator
in/IN preposition
action/NN noun, singular, common
,/, comma
he/PP3A pronoun, personal, nominative, 3rd person singular
's/BEZ verb "to be", present tense, 3rd person singular
in/IN preposistion
traction/NN noun, singular, common
!/! exclamation mark

Note that tokenisation has taken place by default. The first word, "If",
has been decapitalised; the conjoined word "he's" has been split into its
constituent parts; and the punctuation has been stripped from the words.

This is an experimental prototype of the automatic email tagger, so please
be understanding of any problems. If you do have any problems
accessing this tagger or have any bugs to report, please email:
sean@scs.leeds.ac.uk.

Please also let us know if you find this tagger useful.

(If you wish to see this help file only, please type 'help' as the
subject of a blank message.)

*****************************************************************************