Corpora: Chinese (word segmented)

Bill Teahan (wjt@cs.waikato.ac.nz)
Thu, 24 Sep 1998 09:12:06 +1200

Does anyone know of any Chinese text corpora that marks explicitly where
the
word boundaries are? I wish to run some word segmentation experiments
applying a compression-based approach to the problem (e.g. the same
approach for English achieves 99% accuracy) and the more training data
I can get my hands on, the better the results.

Bill Teahan
Department of Computer Science
University of Waikato
Hamilton, New Zealand