TREC6 CLIR Track

Paraic Sheridan (sheridan@inf.ethz.ch)
Mon, 3 Mar 1997 13:15:18 +0100 (MET)

(Apologies if you receive this message more than once)

[Please note that the corpora resources described here are at present
only available to those who take part in the TREC retrieval experiments]

Dear Collegues,

This year's Text Retrieval Conference (TREC6), organised by the National
Institute of Standards and Technology (NIST), will include a track aimed
at evaluating performance on Cross-Language Information Retrieval (CLIR)
tasks, where retrieval queries are submitted in one language and documents
are retrieved in a second language. If you are not familiar with the TREC
series of conferences, more information can be found at:

http://www-nlpir.nist.gov/TREC/

The current draft of the Cross-Language retrieval task is as follows:

The Cross-Language Information Retrieval (CLIR) track requires
the retrieval of English, German or French documents that are
relevant to topics formated in these three languages. Participating
groups may chose any cross-language combination, e.g. English
queries -> German documents or French queries --> English documents.
To have a baseline for each group, the results of the monolingual
case must also be submitted. For instance, in addition to English
queries -> German documents the result of German queries -> German
documents must be submitted.

NIST will provide 25 topics where each topic will be available in
English, German and French. Evaluated will be the results of
automatic adhoc (short and long queries) as well as the results of
manual adhoc.

The data for the CLIR track consists of:

English: - AP news, 1988-1990 (from existing TREC data)

German: - SDA news, 1988-1990, 185,099 documents, 330MB
- Neue Zuercher Zeitung (NZZ) articles, 1994, 200MB

French: - SDA news, 1988-1990, 141,656 documents, 250MB

(AP = Associated Press)
(SDA = Schweizerischen Depeschenagentur = Swiss news agency)
(NZZ = A Swiss German newspaper)

Note that, although not strictly within the definition of the Cross-Language
task, we also welcome participation by groups who want to do mono-lingual
retrieval experiments using the above French or German data (groups wishing
to do experiments in English only should participate in the main TREC
task).

It is also likely that the data will be supplemented with alignment
information relevant to the SDA French/German collection, and with some
morphological analysis information, to support retrieval experiments.

If you are interested in participating in the Cross-Language Retrieval
track of TREC6, please send requests to 'sheridan@inf.ethz.ch' for
further information.

Regards,
Paraic Sheridan
ETH Zurich.