Re: Corpora: Using a relational database to store conc pointers

From: Tylman Ule (ule@sfs.nphil.uni-tuebingen.de)
Date: Thu Mar 30 2000 - 10:05:37 MET DST

  • Next message: Young David: "Corpora: Varieties of English"

    Dear Mickel,

    If your files have a reasonable length, then did you consider storing
    pointers to files only, and resolving the positions inside files
    automatically on look-up?

    Of course, the overhead may be too big if tokens have to be identified
    on the fly, but I am using this approach with a tokenised corpus, and
    speed is o.k.

    Hope that helps,
    Tylman

    Mickel Grönroos wrote:
    >
    > Does anybody have any experience of using a relational database to store
    > index information for a concordance service?
    >
    > I'm building a test interface for the Bank of Finnish and plan to store
    > pointers to specific locations in the corpus in a database column, e.g.
    > something like 344:2555 would point to corpus file number 344, byte
    > position 2555.
    >
    > The obvious problem is how one should handle common words, as every
    > occurence of a specific type needs a pointer of its own. So, if the
    > frequency of some common word is, say 50,000 this would generate 50,000
    > pointers as well. Putting these in one field in a column seems to be
    > rather foolish. Does anybody know how to avoid this?
    >

    --
    Tylman Ule,  Tel. 07071/29-78490, Fax 07071/550520
    	Seminar für Sprachwissenschaft, Universität Tübingen
            Kleine Wilhelmstraße 113, 72074 Tübingen
    



    This archive was generated by hypermail 2b29 : Thu Mar 30 2000 - 10:05:37 MET DST