Re: Corpora: Using SARA to query other corpora than the BNC

From: Lou Burnard (lou.burnard@computing-services.oxford.ac.uk)
Date: Fri Jun 22 2001 - 12:39:18 MET DST

  • Next message: Thomas Kuenneth: "Re: Corpora: Using SARA to query other corpora than the BNC"

    On Thu, 21 Jun 2001, Thomas Kuenneth wrote:

    > The point is that Sara is far from being an ideal corpus query system! But you
    > are certainly right in saying that corpus data should be distributed in a
    > standardized format.

    Good to see that we have some agreement on that at any rate. I don't think
    anyone (certainly not me) has ever claimed that SARA was an ideal corpus
    query system. I do however claim that it's one of the best currently
    around for handling XML encoded corpora of more than trivial size.
    If you have specific suggestions about facilities it lacks, or ways it
    could be improved I hope you'll share them.

    > Because there are things that might have done better. And although I do not want
    > to end up in a debate about operating systems - there are other platforms in
    > widespread use and if I am not mistaken the client software is available for
    > Windows only (I'd be happy to hear that there is a version that will compile
    > under HP/UX - and I am not talking about the server).

    SARA was designed before Java made cross platform development
    (relatively) easy. At that time, common wisdom was that one should develop
    platform-specific clients which could interact with platform-independent
    servers, and that's the design we followed. The current SARA client can
    only run on Windows, but there is no reason why clients should not be
    developed for other platforms, as indeed Hans Martin and his colleagues
    have impressively demonstrated.

    > Meta languages are ideal for interchange purposes but I doubt that ANY software
    > will handle SGML data describing 100 million annotated word forms efficiently.
    > But that's another story.

    On what grounds do you make this assertion? I suppose it all depends what
    you mean by "handle efficiently", but it's simply not true that NO
    software can handle SGML data on that scale. And what would you advocate
    as an alternative?

    Lou



    This archive was generated by hypermail 2b29 : Fri Jun 22 2001 - 12:38:24 MET DST