Question about the Xerox POS tagger

Jose T.A. Camara [gpl] (jtc@di.fct.unl.pt)
Mon, 30 Sep 96 12:30:07 GMT

Subject: Question about the Xerox POS tagger

Dears all,

I have been working on my MS thesis, on statistical NLP, and I have to run
The Xerox Part-of-Speech Tagger for the portuguese language, but unfortunately
it fails before generating the HMM, more exactly when accessing the file
"training.txt".

It runs pretty well for the english language.

I have modified the tagger (adapting to the portuguese language) strictly
accordingly to the Xerox document, that is, creating a portuguese lexicon
(exactly the same structure as the english one), based upon a portuguese
corpus, and appropriate open classes, symbol and transition biases,
as well as specifying all required new paths.

The modified tagger (tag-brown.lisp), includes the commands to compile and
load the tag-trainer and to "train-on-files" on the "training.txt", a file
with some text in portuguese.

I have no references about the strucure of this "training.txt" file, thus
I do not really know if it requires any special structuring, or if the
failure is due to this fact.

Should this file also include tags? If yes, in what structure?

Note: the tagger fails during the execution of the command:

(pdefsys:load-system :tag-english)

right after opening the training.txt file.


I appreciate any help/orientation in order to solve this problem

Thank you so much

Jose Camara (jtc@fct.unl.pt)
My environment is:

System: SunOS Release 4.1.3_U1 (GENERIC+MZ+MULTICAST)
Lisp: CMU Common Lisp 17f
Tagger: tagger-1.2.tar
Guide: The Xerox Part-of-Speech Tagger Version 1.0 document
by Doug Cutting and Jan Pederson
Executing successfully the following instructions:
(compile-file "src/pdefsys")
(load "src/pdefsys")
(pdefsys:compile-system :tdb-sysdcl)
(pdefsys:load-system :tdb-sysdcl)
(pdefsys:compile-system :tag-english :propagate t)

-------------------------------------------------------------------
Universidade Nova FCT
Lisbon 27 of September of 1996