Corpora: robustness in statistical methods

Sandra Kuebler (kuebler@sfs.nphil.uni-tuebingen.de)
Fri, 10 Sep 1999 17:06:17 +0200

Dear all,

one of the major assets of statistical approaches in NLP is the
robustness to errors in the training data. I was wondering if anybody
has done some research on the effect of error rates on the success of
the trained system. I can imagine that if you have a large training set,
an error rate of 3-5% would not really make a difference, but if the
trainig set is rather small, things might be different.

I am aware of Walter Daeleman's publications for memory-based learning
where leaving out dubious cases causes a drop in performance. Has
anybody else done some work on this?

Any hint would be appreciated.

Thanks in advance,

Sandra