Corpora: re: corpus indexing program

From: Lou Burnard (lou.burnard@computing-services.oxford.ac.uk)
Date: Wed Jun 05 2002 - 14:55:57 MET DST

  • Next message: Martin Wynne: "Corpora: Seminar - Developing Linguistic Corpora"

    On Sat, Jun 01, 2002 at 12:53:16PM +0200, E.S. wrote:
    > Can anyone direct me to a corpus indexing program that does fast
    > searches. I have dabbled in Wordsmith and Winconcord for Windows, but
    > neither does a complete index of my entire database of text,
    > approximately 2 GB, and both seem to take about 20 minutes on a Pentium
    > 233 for one search.

    The SARA program developed for the BNC (which is slightly more than 2
    Gb of text) would handle this job easily. The success with which it
    would provide superior searching abilities to your current combination
    of tools depends on how the text in your corpus is organized. If you
    would like to send me a few sample files, I'd be glad to test it out
    for you.

    We are currently working on a major new version of the SARA program,
    which will include several enhancements to the indexer. Any strong
    views people have on how indexing of large corpora should be specified
    would be gratefully received and I hope to be demonstrating the new
    version at TALC next month.

    Lou Burnard

    ----- End forwarded message -----



    This archive was generated by hypermail 2b29 : Wed Jun 05 2002 - 15:02:39 MET DST