Re: Corpora: software for sampling and analysing corpus

Ken Litkowski (ken@clres.com)
Wed, 08 Oct 1997 15:59:07 -0700

Jean Hudson wrote:
>
> Can anyone recommend software (apart from Wordsmith) or computational
> methods for doing vocabulary analysis of samples of text to control the
> balance within
> a large text corpus?
>
> I would like to be able to take a sample of, say, 100,000 words and
> see how many different word forms there are within it. Also, I would
> like to see how the word frequencies within the sample match up with
> a control list of frequencies taken from a larger mixed-text corpus.
>
> It would also be useful to have a list of the words that occur
> significantly more frequently within the sample than they do
> within the language as a whole.
>

It is possible that the MCCA content analysis portion of my DIMAP
software may provide a great deal of information that might be of use.
Check my web page for a link to a fairly complete description of the
functionality within this component.

Best,
Ken

-- 
Ken Litkowski                         TEL.: 301-926-5904
CL Research                           EMAIL: ken@clres.com
20239 Lea Pond Place                    
Gaithersburg, MD 20879-1270 USA       Home Page: http://www.clres.com