Re: Corpora: Using a relational database to store conc pointers

From: Oliver Mason (oliver@clg.bham.ac.uk)
Date: Fri Mar 31 2000 - 10:47:20 MET DST

Next message: Wietze Helmantel: "Corpora: FW: Articles on the subject of word sense disambiguation."

Previous message: Christina Rosén: "Corpora: German corp. Thanks"
In reply to: Tom Vanallemeersch: "Re: Corpora: Using a relational database to store conc pointers"
Next in thread: Leidner, Jochen: "RE: Corpora: Using a relational database to store conc pointers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

If one goes for implementing their own system instead of using a general-
purpose database the definite guide is

Witten, I., Moffat, A., Bell, T. (1994)
Managing Gigabytes: Compressing and Indexing Documents and Images
Van Nostrand Reinhold, New York.

Despite its technical topic it is very readable, even for people without
a mathematical background.

<shameless plug>
The CUE system (available from the Birmingham Corpus Research Website, and
also through an application called QWICK on the BNC Sampler and the latest
ICAME CD ROM) is a Java implementation of algorithms described there. Apart
from just compressing the index, the text is also compressed, which means
that the data size of the fully indexed corpus is below the size of the
uncompressed plain text input file.
</shameless plug>

Oliver Christ pointed that book out to me about five years ago, and I believe
the Stuttgart corpus access system is also based on it, as he was working on
it at the time.

Oliver

-- 
//\\ computer officer | corpus research | department of english | school of  -
//\\ humanities | university of birmingham | edgbaston | birmingham b15 2tt  -
\\// united kingdom | phone +44-(0)121-414-6206 | fax +44-(0)121-414-5668/\  -
\\// mobile 07050 104504 | http://www.clg.bham.ac.uk | o.mason@bham.ac.uk\/  -

Next message: Wietze Helmantel: "Corpora: FW: Articles on the subject of word sense disambiguation."
Previous message: Christina Rosén: "Corpora: German corp. Thanks"
In reply to: Tom Vanallemeersch: "Re: Corpora: Using a relational database to store conc pointers"
Next in thread: Leidner, Jochen: "RE: Corpora: Using a relational database to store conc pointers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Mar 31 2000 - 10:45:16 MET DST