[Corpora-List] SMT models trained on EUROPARL

From: Joerg Tiedemann (tiedeman@let.rug.nl)
Date: Thu Dec 16 2004 - 13:15:20 MET

Next message: Joerg Tiedemann: "[Corpora-List] SMT models trained on EUROPARL"

Previous message: ELDA: "[Corpora-List] ELRA - Language Resources Catalogue - Special Offer"
Next in thread: Joerg Tiedemann: "[Corpora-List] SMT models trained on EUROPARL"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

for people interested in MT and alignment:

models for statistical machine translation trained with GIZA++ and the
EUROPARL corpus are now available from the OPUS homepage:

http://logos.uio.no/cgi-bin/opus/viewcvs.cgi/opus/EUROPARL/wordalign/

I used the standard settings of GIZA++ for producing IBM model 4. so
far you can find the models of all languages aligned to Dutch (in both
directions). models for other language pairs will be made available as
soon as the training is finished.

there are also files with the complete list of token links and type links
produced from the intersection of source-to-target and target-to-source
Viterbi alignments. token links are in XML in the files called
SRCTRG.inter.gz and type links are in files called SRCTRG.dic.gz (with SRC
and TRG replaced by the actual language code). everything is encoded in
unicode utf8.

please let me know if this is useful for you. would be nice to know if
this is not only a waste of hardisk space.

best regards,

Jörg

***********/\/\/\/\/\/\/\/\/\/\/\************************************
** Jörg Tiedemann tiedeman@let.rug.nl **
** Alfa-Informatica http://www.let.rug.nl/~tiedeman **
** Rijksuniversiteit Groningen Harmoniegebouw, room 1311-429 **
** Oude Kijk in 't Jatstraat 26 phone: +31 (0)50-363 5935 **
** 9712 EK Groningen fax: +31 (0)50-363 6855 **
*************************************/\/\/\/\/\/\/\/\/\/\/\**********

Next message: Joerg Tiedemann: "[Corpora-List] SMT models trained on EUROPARL"
Previous message: ELDA: "[Corpora-List] ELRA - Language Resources Catalogue - Special Offer"
Next in thread: Joerg Tiedemann: "[Corpora-List] SMT models trained on EUROPARL"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Dec 16 2004 - 13:10:04 MET