RE: Corpora: kwic concordances with Perl...or rather use a

Andrew Harley (aharley@cup.cam.ac.uk)
Mon, 11 Oct 1999 14:07:47 +0100

>Does anybody use RDMBS for corpus storage? I'm only
>aware of one forthcoming work at U Erlangen and
>the efforts of Gerry Knowles and co-workers at U Lancaster
>on the spoken side. I'd be interested to hear of
>others on this list.

We use RDBMS alongside straight text files to store the Cambridge
International Corpus. RDBMS do not help (rather the reverse in our
investigations to date) with the bulk of the corpus text and the main
indexes, but are ideal for:

(1) storing the links into the main indexes, e.g. give me the main index
entry for "word" or "[noun]"

(2) storing the "header" information, the detailed categorising of each
citation

Andrew Harley
Systems Development Manager - ELT Reference
Cambridge University Press

Direct line: (01223)325880
Fax: (01223)325984

Try Cambridge International Dictionaries online at:
http://www.cup.cam.ac.uk/elt/dictionary

We have just published the Cambridge Dictionary of American English (book
and CD-ROM combined for only $20.95): see http://www.cup.org/esl/cdae for
more details and to order online.