The problem arises in text segmentation and grouping -- the
sentences are the English definitions of alternative partitions of
Thai words (which are normally not segmented, as in top-end /
to-pend). Correct partitions are more likely to be related to each
other (or to a neighbor word) than incorrect partitions.
Yes, I know that any number that pops out won't be
particularly meaningful, but it's better than nothing. Note also
that we don't have nice, neat, accurate, one-word English
glosses for the Thai original, so looking for co-occurrence
stats is not an easy alternative.
Perl code working from WordNet data, or some publicly available
thesaurus, would be ideal.
Thanks in advance,
Doug Cooper
__________________________________________________
1425 VP Tower, 21/45 Soi Chawakun
Rangnam Road, Rajthevi, Bangkok, 10400
doug@th.net (662) 246-8946 fax (662) 246-8789
Southeast Asian Software Research Center, Bangkok
http://seasrc.th.net --> SEASRC Web site
http://seasrc.th.net/sealang --> SEALANG Web site