Re: Corpora: POS disambiguation

Oliver Mason (oliver@clg9.bham.ac.uk)
Thu, 23 Oct 1997 09:15:47 +0100

Date: Wed, 22 Oct 1997 13:50:02 -0400
Reply-To: Adwait Ratnaparkhi <adwait@unagi.cis.upenn.edu>
From: Adwait Ratnaparkhi <adwait@unagi.cis.upenn.edu>
Organization: University of Pennsylvania
X-Mailer: Mozilla 4.02 [en] (X11; I; SunOS 5.5.1 sun4u)
MIME-Version: 1.0
To: "D.H. Van Uytsel" <Donghoon.VanUytsel@esat.kuleuven.ac.be>
CC: corpora@hd.uib.no
Subject: Re: Corpora: POS disambiguation
References: <Pine.GSO.3.96.971022092758.12716B-100000@spilliaert>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-corpora@lists.uib.no
Precedence: bulk
Resent-Date: Wed, 22 Oct 1997 19:50:34 +0200
Resent-From: corpora-request@lists.uib.no
Lines: 52

D.H. Van Uytsel wrote:
> I would like to tag a running text containing a few M words. It is not the
> focus of my research, so I can't spend too much time on this. As a poor
> researcher, I have looked around for some good freeware. For my purposes, it
> should be
> [..]
Adwait Ratnaparkhi wrote:
> I have written a statistical tagger based on a maximum entropy model ,
> which I refer to as MXPOST (for lack of a better name).
> It is written in Java, and the executable (i.e., "bytecode") is free for
> research purposes.
> It should, in theory, run on any platform with a java interpreter.

I also have written a (probabilistic) tagger which consists of a client
(written in Java) and a server (written in C). Training the tagger is
extremely fast, it just involves re-formatting the pre-tagged training
corpus. It is also independent of language or tagset. Preliminary
evaluations for Swedish (by Daniel Ridings) and Romanian (by Dan Tufis)
have given error rates of about 3%.

The tagger is freely available for research purposes at
http://www-clg.bham.ac.uk/QTAG

Oliver Mason

-- 
//\\ computer officer | corpus research | department of english | school of  -
//\\ humanities | university of birmingham | edgbaston | birmingham b15 2tt  -
\\// united kingdom | phone +44-(0)121-414-6206 | fax +44-(0)121-414-5668/\  -
\\// mobile 07050 104504 | http://www-clg.bham.ac.uk | o.mason@bham.ac.uk\/  -