Corpora: text matching applications

John Milton (lcjohn@uxmail.ust.hk)
Fri, 30 Jan 1998 12:29:41 +0800 (HKT)

There was a lot of interest in this a while back: the following tidbit
from EduPage may be of interest:

SLEUTH FINDS PLENTY OF PLAGIARISM ON THE NET
Cancer researcher Marek Wronski used the National Library of Medicine's
PubMed to find instances of 30 allegedly plagiarized medical papers
ostensibly authored by a Polish chemical engineer. PubMed offers a
push-button function labeled "find related articles," which uses
statistical algorithms to identify root words in an article, and then
searches for similar instances of the root words in other articles.
Additional research by Wronski has unearthed 29 more suspect papers. The
engineer, who claimed to have authored 125 articles in a 13-year career,
now faces charges of plagiarism. (Science 23 Jan 98)
__________________________________________________
John Milton
The Hong Kong University of Science and Technology