Re: Corpora: MWUs and frequency; try Relative Frequency

Vasileios Hatzivassiloglou (
Mon, 12 Oct 1998 09:56:20 -0400 (EDT)

In addition to the references that Ted mentions (which are very much
relevant), may I point to some of our own work on this same topic? In
1996, we published an article in Computational Linguistics where we
examined several word association metrics, including two variants of mutual
information and the Dice coefficient, and we argued that the empirically
observed better performance of the latter was theoretically justified.
We explained this mainly on the basis of the asymmetry that the Dice
coefficient has in treating matches where both words have been seen
versus matches where none of the two words has been seen. Unfortunately,
we never got to put up an online version, but the article shouldn't be
that hard to find.


,title="Translating Collocations for Bilingual Lexicons: {A} Statistical
,author="Frank Smadja and Kathleen R. McKeown and Vasileios Hatzivassiloglou"
,journal="Computational Linguistics"
,volume="{\bf 22}"