Corpora: Using a relational database to store conc pointers

From: Mickel Grönroos (mcgronro@ling.helsinki.fi)
Date: Thu Mar 30 2000 - 09:37:39 MET DST

  • Next message: betty@cogsci.ed.ac.uk: "Corpora: Resending of Job Advert"

    Dear colleagues,

    Does anybody have any experience of using a relational database to store
    index information for a concordance service?

    I'm building a test interface for the Bank of Finnish and plan to store
    pointers to specific locations in the corpus in a database column, e.g.
    something like 344:2555 would point to corpus file number 344, byte
    position 2555.

    The obvious problem is how one should handle common words, as every
    occurence of a specific type needs a pointer of its own. So, if the
    frequency of some common word is, say 50,000 this would generate 50,000
    pointers as well. Putting these in one field in a column seems to be
    rather foolish. Does anybody know how to avoid this?

    All comments are welcome.

    Thanks,

    Mickel Grönroos
    Helsinki

    www.ling.helsinki.fi/~mcgronro/ | Mickel.Gronroos@helsinki.fi
    ---------------------------------|----------------------------
    Inst. för allmän språkvetenskap | Dep. of General Linguistics
    PB 4 (Fabiansgatan 28) | tfn/phone +358-9-191 22707
    FI-00014 Helsingfors universitet | fax +358-9-191 23598



    This archive was generated by hypermail 2b29 : Thu Mar 30 2000 - 09:37:19 MET DST