[Corpora-List] SMT models trained on EUROPARL

From: Joerg Tiedemann (tiedeman@let.rug.nl)
Date: Thu Dec 16 2004 - 13:15:20 MET

  • Next message: Joerg Tiedemann: "[Corpora-List] SMT models trained on EUROPARL"

    for people interested in MT and alignment:

    models for statistical machine translation trained with GIZA++ and the
    EUROPARL corpus are now available from the OPUS homepage:

    http://logos.uio.no/cgi-bin/opus/viewcvs.cgi/opus/EUROPARL/wordalign/

    I used the standard settings of GIZA++ for producing IBM model 4. so
    far you can find the models of all languages aligned to Dutch (in both
    directions). models for other language pairs will be made available as
    soon as the training is finished.

    there are also files with the complete list of token links and type links
    produced from the intersection of source-to-target and target-to-source
    Viterbi alignments. token links are in XML in the files called
    SRCTRG.inter.gz and type links are in files called SRCTRG.dic.gz (with SRC
    and TRG replaced by the actual language code). everything is encoded in
    unicode utf8.

    please let me know if this is useful for you. would be nice to know if
    this is not only a waste of hardisk space.

    best regards,

    Jörg

    ***********/\/\/\/\/\/\/\/\/\/\/\************************************
    ** Jörg Tiedemann tiedeman@let.rug.nl **
    ** Alfa-Informatica http://www.let.rug.nl/~tiedeman **
    ** Rijksuniversiteit Groningen Harmoniegebouw, room 1311-429 **
    ** Oude Kijk in 't Jatstraat 26 phone: +31 (0)50-363 5935 **
    ** 9712 EK Groningen fax: +31 (0)50-363 6855 **
    *************************************/\/\/\/\/\/\/\/\/\/\/\**********



    This archive was generated by hypermail 2b29 : Thu Dec 16 2004 - 13:10:04 MET