Re: [Corpora-List] On tools for indexing and searching large corpora

From: Olonichev Sergei (olonichev@scnsoft.com)
Date: Wed Nov 20 2002 - 10:51:28 MET

Next message: Amy Neale: "[Corpora-List] Short Course: Corpus Design and Use"

Previous message: krausse: "[Corpora-List] Environmental Engineering English"
In reply to: Arne Fitschen: "Re: [Corpora-List] On tools for indexing and searching large corpora"
Next in thread: Sylvain Loiseau: "Re: [Corpora-List] On tools for indexing and searching large corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Agree with Arne Fitschen, and the source code of the system is probel
available.
It was used for indexing 300+ million word corpus of English and showed a
pretty good performance.
It can be compiled withot any problems under Linux and Cygwin.

BR,
Sergei

----- Original Message -----
From: "Arne Fitschen" <fitschen@ims.uni-stuttgart.de>
To: <corpora@lists.uib.no>
Sent: 19 ноября 2002 г. 15:04
Subject: Re: [Corpora-List] On tools for indexing and searching large
corpora

> mdavies@ilstu.edu wrote:
> >
> > This is a question that I've asked myself many times. I would love to
see a
> > book that discussed the approach taken by the BNC, the BoE, CREA,
corpora based
> > on the IMS Corpus Workbench (such as O Público), etc to "look under the
hood"
> > and see how each of these corpora and indexing schemes is organized. As
you
> > mentioned, as more and more people start creating 100+ million word
corpora, it
> > would be a shame if they all ended up having to re-invent the wheel.
>
>
> I don't know of such a book, but for the IMS Corpus Workbench I believe
> that some of the ideas concerning data storage and indexing schemes were
> taken from this book:
>
> Ian H. Witten, Alistair Moffat, and Timothy C. Bell
> Managing Gigabytes
> Compressing and Indexing Documents and Images
> May 1999
>
> (here's a link to the second edition of the book:
> http://www.cs.mu.oz.au/mg/).
>
> Regards,
> Arne Fitschen
>

Next message: Amy Neale: "[Corpora-List] Short Course: Corpus Design and Use"
Previous message: krausse: "[Corpora-List] Environmental Engineering English"
In reply to: Arne Fitschen: "Re: [Corpora-List] On tools for indexing and searching large corpora"
Next in thread: Sylvain Loiseau: "Re: [Corpora-List] On tools for indexing and searching large corpora"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Wed Nov 20 2002 - 11:11:56 MET