Corpora: 4-month Research officer, text processing

From: Eric Atwell (
Date: Mon Jan 14 2002 - 22:55:43 MET

  • Next message: Magali Duclaux: "Corpora: LREC 2002 Workshop on Arabic Language Resources and Evaluation"

    Leeds University: 4-month Visiting Research Fellowship

    Customising a copying-identifier for Biomedical Science student reports

    We have funds for a 4-month temporary researcher on an applied IT project.
    If you know a postgrad or postdoc wanting a 4-month paid practical placement
    in a leading research school; OR writing up Thesis and looking for initial
    research post; OR academic seeking funded 4-month sabbatical...
    please pass this on:

    Project Leaders: Eric Atwell (School of Computing), Paul Gent (School of
    Biomedical Sciences), Clive Souter (Director of Joint Honours in Science).

    Aim of Project:
    To develop a system for detecting student copying in laboratory practical
    reports, customised to a specific genre/subject, in our case initially
    Biomedical Science first-year reports. The project will involve:
    - requirements analysis, collating a Test Corpus of student reports;
    - survey of available systems and how well these match requirements spec;
    - implementation, based as much as possible on existing software;
    - testing and evaluation on Test Corpus of Biomedical science student reports;
    - writing documentation and user manual for future use and maintenance.

    The Project Output will be: a system customised for this specialised detection
    task, and documentation of the methodology used to aid and encourage further
    development of customisations or other genres. An indirect extra output will
    hopefully be increased appreciation (and takeup) by lecturers of further
    possibilities opened for computer-based processing of student reports.

    We are aware of many available plagiarism detectors (eg turnitin and others
    investigated by HEFCE), but Biomedical Science teaching staff believe these
    generic systems are generally too sophisticated / complicated for this specific
    "What we need is something which will go through a directory of Word
    .doc files, strip or ignore formatting and just pick out areas of
    correspondence. In other words copying rather than plagiarism as we know it.
    When you have 200+ practical reports on one exercise it's bound to happen
    and at present it is spotted by chance alone. If we can nip it in the bud
    at level one we can think about the more esoteric stuff like turnitin later."

    The project must be completed before 30 June 2002;
    we hope to appoint a Project Officer for 4 months, February-May 2002.
    Candidates must be EU citizens, not requiring work permits in UK.

    FURTHER DETAILS and APPLICATIONS: contact Eric Atwell,
    OR Paul Gent, OR Clive Souter,


    Eric Atwell, Distributed Multimedia Systems MSc Tutor & SOCRATES Tutor
    School of Computing, University of Leeds, LEEDS LS2 9JT
    TEL: 0113-2335430  MOBILE: 0775-1039104 FAX: 0113-2335468
    WWW:  EMAIL:

    This archive was generated by hypermail 2b29 : Mon Jan 14 2002 - 23:17:11 MET