[Corpora-List] SHARES document similarity system

From: Andrew Kehoe (andrew@rdues.liv.ac.uk)
Date: Wed Mar 24 2004 - 15:52:14 MET

  • Next message: Eva Csato: "[Corpora-List] Parallel Swedish-Turkish-Swedish translation corpora"

    Dear Colleague

    For the past 3 years the Research and Development Unit for English
    Studies has been working on an EPSRC-funded project called SHARES
    (System of Hypermatrix Analysis, Retrieval, Evaluation and
    Summarisation). The aim of the project was to test the hypothesis that
    similar patterns of lexical repetition are sufficiently maintained
    across differently authored documents on similar topics to support a
    high-performance retrieval engine.

    This will be of interest to people working on document similarity and
    applications of Lexical Cohesion. We have produced an online demo
    system and user guide, and would appreciate your feedback:

             http://www.rdues.liv.ac.uk/sharesguide

    This demo system uses a small test corpus made up of 11 topics, with 3
    news articles on each topic. It allows the comparison of article pairs
    or of 1 article with all other articles in the test corpus. Stemming
    and weighting options are available. This is a cut-down version of our
    full SHARES software, designed for faster online access.

    An anonymous feedback form is provided on our website for your use:
    http://www.rdues.liv.ac.uk/sfeedback.shtml. You may send comments by
    email to andrew@rdues.liv.ac.uk if you prefer.

    Thank you in advance

    Andrew Kehoe
    Research and Development Unit for English Studies
    University of Liverpool
    http://www.rdues.liv.ac.uk
    WebCorp: http://www.webcorp.org.uk



    This archive was generated by hypermail 2b29 : Wed Mar 24 2004 - 16:07:35 MET