[Corpora-List] Summary: GUI for Word Alignment

From: Pete Whitelock (pete.whitelock@sharp.co.uk)
Date: Thu Oct 07 2004 - 12:47:45 MET DST

  • Next message: Knut Hofland: "[Corpora-List] Admin: Mail aliases"

    On 22nd September, I posted the query:

    >> What's the state of the art in GUIs allowing translators to develop gold standard word aligned bilingual corpora? Is there anything publicly available?
     
    >> Particularly interesting would be software that takes in an automatically generated alignment and allows the user to patch it up.

    >> Also, because I'm interested in aligning a head final and a head initial language pair, software that shows alignments in color rather than by lines would be optimal.

    Here's a summary of the replies (I haven't included the details of input and output representations, which are easy enough to massage, but I've reported the type of display where I could).

    Rada Mihalcea maintains a page of links to Word (and Sentence) Alignment tools and resources at http://www.cs.unt.edu/~rada/wa

    Noah Smith (nasmith@gmail.com) developed a visualisation tool, Cairo, with Mike Jahr during EGYPT, the 1999 Statistical MT Workshop at Johns Hopkins. It's in Java and displays alignments with lines linking words in the two languages. Currently it doesn't allow alignments to be modified but could be extended to do that. It's downloadable from http://www.clsp.jhu.edu/ws99/projects/mt/toolkit/
    and you can see what it looks like at http://www.clsp.jhu.edu/ws99/projects/mt/report/1/9.gif

    Ted Pederson (tpederse@d.umn.edu) 's Alpaco, with a similar line-based display, is available at http://www.d.umn.edu/~tpederse/parallel.html. It's written in Perl and Tk and allows new alignments to be specified.

    Hal Daume of ISI (hdaume@ISI.EDU) wrote HandAlign, a similar tool for aligning articles and their summaries, available at http://www.isi.edu/~hdaume/HandAlign/. It's in Java, and again produces line-based display, but the two texts being aligned are independently scrollable.

    Magnus Merkel (magme@ida.liu.se) and his colleagues at Linköping have developed an interactive word aligner(I*Link) written in Java and which displays alignments with color-coding. You can download an academic version from http://www.ida.liu.se/~nlplab/ILink/. A screenshot is attached (ilink.gif)

    Jorg Tiedeman (tiedeman@let.rug.nl) has implemented a demo web-interface in Perl for handling parallel corpora, with the possibility of editing automatically word-aligned corpora. You have to register before you can use your own corpora. http://stp.ling.uu.se/cgi-bin/joerg/Uplug

    Phillip Koehn (koehn@csail.mit.edu) has also implemented a web-based tool, an example of which is viewable at http://montev.isi.edu:8000/align-tool/?CORPUS=de-news-morphix&AFILE=full-model1-50-50.gz. Alignments are displayed in matrix format with checkboxes that can be set or cleared.

    Chris Callison-Birch (callison-burch@ed.ac.uk) of Linear B (http://linearb.co.uk) also has available a matrix display alignment tool with colored grid squares representing 'sure' or 'probable' alignments. It also allows output of a list of phrases that can be extracted from the word alignments. A screenshot is attached (linearB.tiff)

    Interested readers should consult Rada Mihalcea's web page for further links, including one to Patrick Lambert's Lingua-AlignmentSet toolkit in Perl for handling word alignments (http://www.lsi.upc.es/~lambert/software/AlignmentSet.html). This allows display in matrix format (line format will be implemented in the future), conversion between different representations and evaluations against a gold standard.

    Attatchments:

    http://helmer.hit.uib.no/corpora/ilink.gif
    http://helmer.hit.uib.no/corpora/linearB.tif



    This archive was generated by hypermail 2b29 : Tue Oct 12 2004 - 10:07:22 MET DST