Re: Xtract Users?

Pascale Fung (pascale@cs.columbia.edu)
Sat, 14 Oct 1995 13:32:48 -0400

We have extended Xtract to CXtract, a tool for extracting new words in
Chinese corpora. In our extension, we also require xtract0 to take untagged
input. The easy way to do that is to change all input tag field into a
dummy tag, pass the input in as tagged, choose the tag field in the Xtract
command line to be the dummy tag, and filter the dummy tags out after that
stage. Another way is to modify the program which takes the input
tags. This would depend on the version of Xtract your are using. Look into
the online manual, you should be able to locate that particular program in
your package.

For tagged input, I don't think the AT&T Stochastic Tagger is ftp-able
anymore (yes, it is describe in Church88). The tagger format is described
in the Xtract manual. You can use any other tagger as long as you convert
the tagged input into the format used by Xtract.

pascale