Re: Corpora: Using a relational database to store conc pointers

From: Tylman Ule (ule@sfs.nphil.uni-tuebingen.de)
Date: Thu Mar 30 2000 - 10:05:37 MET DST

Next message: Young David: "Corpora: Varieties of English"

Previous message: Remi Zajac: "Corpora: CFP: COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems"
In reply to: Mickel Grönroos: "Corpora: Using a relational database to store conc pointers"
Next in thread: Tom Vanallemeersch: "Re: Corpora: Using a relational database to store conc pointers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Dear Mickel,

If your files have a reasonable length, then did you consider storing
pointers to files only, and resolving the positions inside files
automatically on look-up?

Of course, the overhead may be too big if tokens have to be identified
on the fly, but I am using this approach with a tokenised corpus, and
speed is o.k.

Hope that helps,
Tylman

Mickel Grönroos wrote:
>
> Does anybody have any experience of using a relational database to store
> index information for a concordance service?
>
> I'm building a test interface for the Bank of Finnish and plan to store
> pointers to specific locations in the corpus in a database column, e.g.
> something like 344:2555 would point to corpus file number 344, byte
> position 2555.
>
> The obvious problem is how one should handle common words, as every
> occurence of a specific type needs a pointer of its own. So, if the
> frequency of some common word is, say 50,000 this would generate 50,000
> pointers as well. Putting these in one field in a column seems to be
> rather foolish. Does anybody know how to avoid this?
>

--
Tylman Ule,  Tel. 07071/29-78490, Fax 07071/550520
	Seminar für Sprachwissenschaft, Universität Tübingen
        Kleine Wilhelmstraße 113, 72074 Tübingen

Next message: Young David: "Corpora: Varieties of English"
Previous message: Remi Zajac: "Corpora: CFP: COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems"
In reply to: Mickel Grönroos: "Corpora: Using a relational database to store conc pointers"
Next in thread: Tom Vanallemeersch: "Re: Corpora: Using a relational database to store conc pointers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Thu Mar 30 2000 - 10:05:37 MET DST