Re: Corpora: Size of a representative corpus

Ted E. Dunning (ted@aptex.com)
Thu, 20 Aug 1998 14:07:38 -0700

Efron and Thistead (and earlier, Good and Turing) have analyzed this
problem.

See the following article for a discussion of the problem with further
references.

@article{efron87,
author={Bradley Efron and Ronald Thisted},
year=1987,
title={Did Shakespear write a newly discovered poem?},
journal={Biometrika},
volume=74,
pages={445-455}
}

>>>>> "ts" == Tony Berber Sardinha <tony4@uol.com.br> writes:

ts> Also, how could we estimate the number of tokens needed to
ts> make up for 50,001 types?