D.H. Van Uytsel wrote:
> I would like to tag a running text containing a few M words. It is not the
> focus of my research, so I can't spend too much time on this. As a poor
> researcher, I have looked around for some good freeware. For my purposes, it
> should be
> [..]
Adwait Ratnaparkhi wrote:
> I have written a statistical tagger based on a maximum entropy model ,
> which I refer to as MXPOST (for lack of a better name).
> It is written in Java, and the executable (i.e., "bytecode") is free for
> research purposes.
> It should, in theory, run on any platform with a java interpreter.
I also have written a (probabilistic) tagger which consists of a client
(written in Java) and a server (written in C). Training the tagger is
extremely fast, it just involves re-formatting the pre-tagged training
corpus. It is also independent of language or tagset. Preliminary
evaluations for Swedish (by Daniel Ridings) and Romanian (by Dan Tufis)
have given error rates of about 3%.
The tagger is freely available for research purposes at
http://www-clg.bham.ac.uk/QTAG
Oliver Mason
-- //\\ computer officer | corpus research | department of english | school of - //\\ humanities | university of birmingham | edgbaston | birmingham b15 2tt - \\// united kingdom | phone +44-(0)121-414-6206 | fax +44-(0)121-414-5668/\ - \\// mobile 07050 104504 | http://www-clg.bham.ac.uk | o.mason@bham.ac.uk\/ -