[Corpora-List] Availability of Resources for Concept-based Cross-Lingual Information Retrieval

From: Paul Buitelaar (paulb@dfki.de)
Date: Thu Oct 02 2003 - 15:51:17 MET DST

  • Next message: C.I.D.: "[Corpora-List] Fw: Call for Papers -"

    Dear colleague, evaluation resources that were developed within the
    EU/NSF funded project MuchMore on Concept-based Cross-Lingual
    Information Retrieval in the Medical Domain are now freely available
    from the project web site at:

    http://muchmore.dfki.de/resources_index.htm

    Available resources include: a German - English, parallel medical
    document collection, corresponding queries and relevance assessments,
    evaluation sets of disambiguated terms and evaluation lists for
    morphological decomposition of medical terms (German).

    The project developed a cross-lingual information retrieval system that
    enables users to retrieve documents in English and/or German, given a
    query document in English or German. In the current version of the
    system, query documents are assumed to be German electronic patient
    records and documents to be retrieved are medical scientific abstracts
    in both German and English. The cross-lingual information retrieval task
    has been approached through a mix of methods: semantic annotation,
    similarity thesaurus, example-based translation, pseudo relevance
    feedback and vector-space model. Along these lines, three retrieval
    systems have been developed that were integrated into a meta-search
    engine with a common user interface (including an extensive query
    construction functionality) and results presentation (including an
    interactive, multidocument summarization functionality).

    The MuchMore prototype (as well as the individual retrieval systems and
    some additional demos on semantic annotation and term clustering) is
    available at:

    http://muchmore.dfki.de/demos_index_new.htm

    At the core of the MuchMore project has been a comparative evaluation of
    the different approaches used for the cross-lingual information
    retrieval task. Overall results show that best preformance may be
    obtained by a combination of corpus-based and concept-based information,
    i.e. using a combination of manually constructed and automatically
    extracted (semantic) resources. Adding manually constructed knowledge
    (through semantic annotation or classification) improves performance,
    although disambiguation has not been shown to further improve
    performance significantly.

    All results are available as project reports and/or as published papers at:

    http://muchmore.dfki.de/pub.htm

    Please contact us in case of further questions. Thanks for your time,

       Paul Buitelaar

       Coordinator MuchMore
       DFKI - Language Technology
       Saarbruecken, Germany

       http://muchmore.dfki.de/
       http://dfki.de/~paulb/



    This archive was generated by hypermail 2b29 : Thu Oct 02 2003 - 15:57:49 MET DST