[Corpora-List] Summary - sentence aligner script

From: Tony Berber Sardinha (tony4@uol.com.br)
Date: Tue Dec 17 2002 - 17:09:11 MET

  • Next message: Anne Harrap: "[Corpora-List] Wordsmith concordance"

    Dear list members

    thanks to all who replied to my query about sentence aligner scripts:
    Susan Armstrong, Torgny Rasmark, Jean Veronis, François Maniez, Tomaz Erjavec,
    Marco Baroni
    Below are the replies that I got:

    >>Susan Armstrong

    We have a publicly available aligner made for an EU project some years
    ago - available at - http://www.issco.unige.ch/tools/

    >>Torgny Rasmark

    vanilla aligner (for DOS) :
    http://spraakbanken.gu.se/lb/English/downloads.html

    >>Jean Veronis

    see:

    Gale, W., and Church, K. (1993) "A Program for Aligning Sentences in
    Bilingual Corpora," Computational Linguistics, 19:1, pp. 75-102.

    There is a C program published at the end of the paper. It is available
    from Ken's page at:

    http://www.research.att.com/~kwc/publications.html

    >>François Maniez

    Hello,

    this is not about perl or Unix, but I have written a Word macro that does
    the trick if the original format of your data is an x-column table where x
    is the number of languages included in your parallel corpus (I am currently
    building a medical corpus from files available on the European Commission
    website in English, French, German, Italian, Spanish and Portuguese, in
    order to test terminological extraction algorithms).

    The output of the macro needs to be manually corrected, as one sentence will
    occasionally be translated in two sentences and vice-versa.

    Let me know if you're interested, and I'll send it along.

    Cheers,

    François MANIEZ
    Maître de Conférences
    Centre de Recherche en Terminologie et en Traduction
    Département de Langues Étrangères Appliquées
    Université Lumière Lyon 2
    maniezf@univ-lyon2.fr
    fmaniez@wanadoo.fr
    http://nte.univ-lyon2.fr/~maniezf/recherche.html

    >>Tomaz Erjavec

    Hi,
    Vanilla can also be found at
    http://nl.ijs.si/telri/Vanilla/
    complete with an accompanying paper and free to download!
    Best,
    Tomaz

    >>Marco Baroni

    Hi!

    There is a version of the Vanilla aligner, pre-compiled for DOS, on the
    following site:

    http://spraakbanken.gu.se/lb/downloads.html

    It is possible to download a compressed archive from there, but, as I
    don't understand Swedish (assuming it is Swedish...), I don't know if
    there are any restricions on its use.

    Also, if you go to Kenneth Curch's publications page, you can download the
    text version of

    Gale, W., and Church, K. (1993) ³A Program for Aligning
    Sentences in Bilingual Corpora,² Computational Linguistics, 19:1, pp.
    75-102

    which contains the source code for their famous aligner as an appendix.

    Regards,

    Marco Baroni

    cheers
    tony.
    -------------------------------------
    Dr Tony Berber Sardinha
    LAEL, PUC/SP
    (Catholic University of Sao Paulo, Brazil)
    tony4@uol.com.br
    http://lael.pucsp.br/~tony
    [New website]



    This archive was generated by hypermail 2b29 : Tue Dec 17 2002 - 17:14:38 MET