Re: Corpora: Plagiarism detection

From: Tom Vanallemeersch (Tom.Vanallemeersch@lant.be)
Date: Mon May 08 2000 - 18:37:50 MET DST

  • Next message: Carlos Areces: "Corpora: HyLo 2000"

    Paul Clough wrote:
    >
    > Hi,
    >
    > Does anyone know of any current plagiarism detection projects currently
    > going on? I know of Malcolm Coulthard and Copycatch, but are there any other
    > projects? Also, I would like to do some statistical work on plagiarised
    > work, but does anyone know where I can find any data? I am after plagiarism
    > of natural language rather than software plagiarism. Any help would be very
    > much appreciated.
    >
    > Thanks,
    >
    > Paul Clough.
    > Postgraduate at The University of Sheffield,
    > England.

    A while ago, I made a program which can be used for detecting strings
    shared by two texts. It works under Unix and takes two filenames as
    arguments. The output is a list of shared strings ordered by length,
    with information on the occurrences in each text. Strings are only
    listed if they appear with a variable context (e.g. "with respect to"
    would only appear if it is preceded/followed by different words in the
    texts). A shared string may also be a very large text block, in case
    of very similar texts.
    If you think this is useful, I can send you a copy of the program.

    Cheers,

    Tom

    -- 
    LANT nv/sa, Research Park Haasrode, Interleuvenlaan 21, B-3001 Leuven
    mailto:Tom.Vanallemeersch@lant.be               Phone: ++32 16 405140
    http://www.lant.be/                             Fax: ++32 16 404961
    



    This archive was generated by hypermail 2b29 : Wed May 10 2000 - 11:46:13 MET DST