Corpora: representative training text (fwd)

British National Corpus (natcorp@computing-services.oxford.ac.uk)
Tue, 5 Oct 1999 11:09:20 +0100 (BST)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Hamish Cunningham: "Corpora: JOBS in NLP in Sheffield [Please post!]"
Previous message: Gregory Grefenstette: "Re: Corpora: Sentence splitter"

I received the following enquiry, and thought someone on this
list might be able to help the enquirer... if you can, please reply
direct

Lou

---------- Forwarded message ----------
Date: Tue, 5 Oct 1999 00:40:25 -0400
From: bob dagit <bd@dialisdn.com>
To: "'natcorp@oucs.ox.ac.uk'" <natcorp@oucs.ox.ac.uk>
Subject: representative training text

since i am using a standard computer dictation program, i want to acquire a
sample text for training it with the vocabulary building subprogram. that
text should contain the first 20-30,000 most frequent english words in
representative frequencies, with perhaps rounded statistical representation
for the least frequent ones so as not to spend too much time reading a
gigantic sample text with more than enough of the most frequent words.
can you point me to such a document(s) for download or modest purchase
price?
thanks

Next message: Hamish Cunningham: "Corpora: JOBS in NLP in Sheffield [Please post!]"
Previous message: Gregory Grefenstette: "Re: Corpora: Sentence splitter"