Re: [Corpora-List] Some comments on aligners

From: Ute Römer (ute.roemer@uni-koeln.de)
Date: Sat Sep 07 2002 - 11:45:41 MET DST

  • Next message: Santos Diana: "RE: [Corpora-List] Some comments on aligners"

    Dear all,

    Some weekend thoughts on Corpora List discussions -- in reply to Diana
    Santos' recent posting.

    I was just wondering, is it really "a waste of time" to discuss -- on an
    email list the purpose of which it is, or ought to be, to exchange ideas on
    certain specific topics and to help people solve corpus linguistic
    problems -- special software tools, their use, and problems you encounter
    while using them? And does it make a difference then whether the tools in
    question are freely available or not? What's wrong with explicitly asking
    for help with a certain program like Sampo Nevalainen did? I actually do not
    very much like the idea of having to think twice before sending queries on
    commercially available corpora and corpus analysis tools to the list and I
    suspect that other list members might feel the same.

    Have a good weekend all of you!

    Best,
    Ute

    ----- Original Message -----
    From: "Santos Diana" <Diana.Santos@sintef.no>
    To: <corpora@hd.uib.no>
    Sent: Thursday, September 05, 2002 1:04 PM
    Subject: [Corpora-List] Some comments on aligners

    > Dear colleagues,
    >
    > It sounds to me somehow a waste of time and resources to be discussing
    > aligners for a particular commercial application such as ParaConc in this
    > list (I know that was the initial question...), given that there are so
    many
    > other systems that may cater for better functionalities of search in
    > paralell corpora and which are moreover free and already existing.
    >
    > So, after some reflection, I decided, to prevent some naive readers of the
    > list to conclude that the only existing aligners were the ones discussed
    in
    > the previous mail thread, to talk about our approach in COMPARA, basically
    > to suggest to anyone involved in parallel corpora work to use
    >
    > 1) the IMS Corpus Workbench developed at Stuttgart (Stefan Evert and
    Ulrich
    > Heid)
    > 2) and the EasyAlign aligner that comes with it and has all the
    > functionalities that have been described in the previous mails (namely it
    > aligns, or accepts a previous alignment, so that one can easily
    incorporate
    > the results of manual revision into a powerful corpus querying system)
    >
    > For those that would complain that the system is in Unix / Linux and
    > therefore not usable for naive users, the obvious solution is to create a
    > Web frontend as we did in COMPARA, see http://www.portugues.mct.pt/COMPARA
    >
    > I'm not paid to make any advertisements to IMS-CWB nor to align texts for
    > other projects (although we do it ocasionally for some people when one of
    > the languages of the parallel texts is Portuguese), but I really think
    after
    > careful consideration of many other systems and approaches that this is
    the
    > best way to go.
    >
    > People interested in technical details of exactly how the DISPARA setup
    > works can read as well, after the Web pages, the paper
    >
    > Santos, Diana. "DISPARA, a system for distributing parallel corpora on the
    > Web", in Elisabete Ranchhod & Nuno J. Mamede (eds.), Advances in Natural
    > Language Processing (Third International Conference, PorTAL 2002, Faro,
    > Portugal, June 2002, Proceedings), LNAI 2389, Springer, 2002, pp.209-218.
    >
    > and here is a soft presentation for non-technical users
    >
    > Frankenberg-Garcia, Ana & Diana Santos. "Introducing COMPARA, the
    > Portuguese-English parallel translation corpus", paper presented at
    > CULT'2000, to appear in a volume of selected contributions, St.Jerome,
    > http://www.linguateca.pt/Diana/download/FrankenbergSantos.rtf
    > http://www.linguateca.pt/Diana/download/FrankenbergSantos.ps
    >
    > The service we ocasionally do (NB! only when one of the languages is
    > Portuguese!!! -- to be fair, we have so far only tried with
    > English-Portuguese and Norwegian-Portuguese pairs...) is to accept texts
    in
    > text-only format (eg, TEXT1.po and TEXT1.en) already aligned by paragraph
    > (this means one paragraph per line in each text), submit them to EasyAlign
    > and send the output back sentence aligned. (Paragraphs can of course be
    > titles or other things.) I've prepared an example of text input and text
    > output for those interested in the service in
    > http://acdc.linguateca.pt/example_alignment.html. (Note that it has to
    > involve Portuguese as one of the languages)
    >
    > However, I would warmly encourage people to actually use the IMS-CWB
    > themselves and create their own Web services. The advantages of using the
    > query power (also in translation corpora) are tremendous.
    >
    > Diana
    > ************************************************************************
    > Diana Santos Computational processing of Portuguese
    >
    > SINTEF Telecom & Informatics Tel. (direct line) +47 22 06 73 12
    > Forskningsveien 1 Tel. +47 22 06 73 00
    > Box 124 Blindern Fax. +47 22 06 73 50
    > N-0314 Oslo Email: Diana.Santos@sintef.no
    > Norway http://www.portugues.mct.pt/
    > ************************************************************************
    >
    >
    >
    >



    This archive was generated by hypermail 2b29 : Sat Sep 07 2002 - 11:59:14 MET DST