Re: Corpora: query

Ted E. Dunning (
Fri, 29 May 1998 09:52:55 -0700

a simpler approach would be to intersect an english word-list with a
word-list from another language (such as french, spanish, german,
latin). these word lists are readily available. deaccenting the
non-english word-list would probably be a good idea.

this approach will, of course, miss out on words like skosh and honcho
which are taken from japanese. there are approaches which can
approximate the matching between english words and japanese words, but
these approaches are relatively difficult to implement well.