Re: concordance

John Milton (lcjohn@uxmail.ust.hk)
Mon, 20 Jan 1997 22:02:45 +0800 (HKT)

> I am searching for a program which extracts lexical collocations
> and lexical compounds from text.

If you're looking for something on a PC, the only program I know of is
WordSmith, but it isn't free. Go to the OUP site --
http://www1.oup.co.uk/cite/oup/elt/software/wsmith/
for a demo version that you then pay to upgrade. You do a kwic concordance
of the search string and then use the 'cluster' feature to do a 'summary'
of the concordance (either 2/3/4/ words - you choose the option).

The two 2-grams (occuring more than 5 times) of 'indeed' in a million
words from a Hong Kong newspaper are:
and indeed (6)
indeed the (6)

On a related note: has the performance of such 'clustering' algorithms
been compared? Is there anyone on the list who can recommend a "state-of
the-art' algorithm, or source (preferable implemented in C+), to extract
these 'clusters'?
__________________________________________________
John Milton
The Hong Kong University of Science & Technology
email: lcjohn@uxmail.ust.hk
__________________________________________________