Re: Corpora: Using SARA to query other corpora than the BNC

From: Thomas Kuenneth (tommi@linguistik.uni-erlangen.de)
Date: Wed Jun 20 2001 - 14:28:51 MET DST

  • Next message: Lubos Popelinsky: "Corpora: LLL'01: deadline reminder"

    in response to a posting from Lou Burnard Wed, 20 Jun 2001:

    > I am in the process of writing a brief guide to how this can be done,

    I cannot refraim from making some remarks here. Sara undoubtedly is among the
    most frequently used programs in this field (as the BNC plays an important role
    in corpus linguistics).
    Nonetheless I doubt that the use of Sara for querying other corpora is
    desirable. It is common sense that the software has been tailored to fit the
    structure of the BNC (or vice versa, which does not really matter here). I am
    not too sure if the software is flexible enough to meet the requirements of many
    other corpora, as we have to keep in mind that corpus data per se has a very
    informal structure: consider presence or absence of POS tags, base forms, ... Or
    meta information such as titles of sample texts, legal information, dates of
    publication, lists of categories the samples belong to, ...
    And we must not forget that (at least) the Sara client has its weaknesses in
    terms of limitations concerning size of query results etc - or being bound to a
    particular hardware platform.
    There is in fact a bunch of disadvantages and shortcomings of such proprietary
    systems I could address here (and some of which I am going to address on a
    conference soon), which forces me to claim a more flexible, more general
    approach.

    In the future the user should have the possibility to choose which corpus tool
    to use for querying ANY corpus. Some programs are already available (which have
    some limitations, too). Others are being developed right now.

    Regards
    Thomas Künneth

    ---
    Thomas Kuenneth M.A.           Universitaet Erlangen-Nuernberg
    Institut fuer Germanistik         Abteilung Computerlinguistik
    Bismarckstr. 6  *  D-91054 Erlangen  *  Tel.: +49 9131 8529250
    http://www.linguistik.uni-erlangen.de/~tommi
    



    This archive was generated by hypermail 2b29 : Wed Jun 20 2001 - 14:24:23 MET DST