Re: Corpora: software for sampling and analysing corpus
Ken Litkowski (ken@clres.com)
Wed, 08 Oct 1997 15:59:07 -0700
Jean Hudson wrote:
>
> Can anyone recommend software (apart from Wordsmith) or computational
> methods for doing vocabulary analysis of samples of text to control the
> balance within
> a large text corpus?
>
> I would like to be able to take a sample of, say, 100,000 words and
> see how many different word forms there are within it. Also, I would
> like to see how the word frequencies within the sample match up with
> a control list of frequencies taken from a larger mixed-text corpus.
>
> It would also be useful to have a list of the words that occur
> significantly more frequently within the sample than they do
> within the language as a whole.
>
It is possible that the MCCA content analysis portion of my DIMAP
software may provide a great deal of information that might be of use.
Check my web page for a link to a fairly complete description of the
functionality within this component.
Best,
Ken
--
Ken Litkowski TEL.: 301-926-5904
CL Research EMAIL: ken@clres.com
20239 Lea Pond Place
Gaithersburg, MD 20879-1270 USA Home Page: http://www.clres.com