Re: Corpora: Using SARA to query other corpora than the BNC

From: Thomas Kuenneth (tommi@linguistik.uni-erlangen.de)
Date: Thu Jun 21 2001 - 11:14:51 MET DST

  • Next message: Knut Hofland: "Corpora: BOUNCE corpora@lists.uib.no: Non-member submission from [Miloslav Nepil <nepil@informatics.muni.cz>] (fwd)"

    > Well if you can't get the prophet to the mountain, why not just
    > move the mountain to the prophet and reformat the corpora into a nice
    > format like sgml using the BNC dtd. In this way we could use them
    > with SARA. Reformatting corpora is what we have to do to use many

    The point is that Sara is far from being an ideal corpus query system! But you
    are certainly right in saying that corpus data should be distributed in a
    standardized format.

    > other corpus access programs, so why not for SARA.

    Because there are things that might have done better. And although I do not want
    to end up in a debate about operating systems - there are other platforms in
    widespread use and if I am not mistaken the client software is available for
    Windows only (I'd be happy to hear that there is a version that will compile
    under HP/UX - and I am not talking about the server).

    > benefit sgml/xml-formatted corpora might inspire programmers to write
    > "more flexible, more general" software for corpus analysis.

    Meta languages are ideal for interchange purposes but I doubt that ANY software
    will handle SGML data describing 100 million annotated word forms efficiently.
    But that's another story.

    Regards
    Thomas

    ---
    Thomas Kuenneth M.A.           Universitaet Erlangen-Nuernberg
    Institut fuer Germanistik         Abteilung Computerlinguistik
    Bismarckstr. 6  *  D-91054 Erlangen  *  Tel.: +49 9131 8529250
    http://www.linguistik.uni-erlangen.de/~tommi
    



    This archive was generated by hypermail 2b29 : Thu Jun 21 2001 - 11:10:09 MET DST