Re: [Corpora-List] corpus for Spanish and French language

From: Paul McNamee (paulmac@nautilus.jhuapl.edu)
Date: Fri Jun 06 2003 - 17:30:21 MET DST

  • Next message: Chris Brew: "Re: [Corpora-List] XML annotation guidelines"

    You should particularly look at the Cross-Language Evaluation Forum (CLEF)
    project. The CLEF program has been ongoing for ~4 years and has
    developed a re-usable test suite of IR corpora in eight or so European
    languages, including Spanish (~460k docs) and French (~120k docs), that
    I believe can be made available without fee, subject to user agreements.
    Information about CLEF can be found at http://www.clef-campaign.org/
    and the site contains contact information for the project director,
    Carol Peters.

    Best regards,

    - Paul McNamee

    Research and Technology Development Center
    Johns Hopkins University Applied Physics Lab
    11100 Johns Hopkins Road
    Laurel MD 20723-6099 USA
    Voice: +1 443 778 3816
    Fax: +1 443 778 6904
    Email: mcnamee@jhuapl.edu

    On Wed, 4 Jun 2003, Ying Ding wrote:

    > Dear All,
    >
    > We have a small project running here related to search engine. We need to
    > test this search engine in Spanish and French language. We would need some
    > corpus for these two languages. Do you know where to get it for free or
    > with little cost.
    >
    > Another thing is the stop word lists for these two language. Do you know
    > where to find such stop word list.
    >
    > Any help will be highly appreciated! I will provide the summary at the end.
    >
    > Best Regards
    > ying
    >
    > Dr. Ying Ding
    > Assistant Professor
    > Next Web Generation Group
    > Institute of Computer Science, University of Innsbruck
    > Technikerstr. 13, A-6020 Innsbruck, Austria
    > Tel: +43 512 507 6112, Fax: +43 512 507 9872
    > http://www.nextwebgeneration.com/
    >
    >
    >
    >



    This archive was generated by hypermail 2b29 : Fri Jun 06 2003 - 17:36:35 MET DST