Corpora: Corpus of spoken Bulgarian available

Kjetil Ra Hauge (
Wed, 24 Jun 1998 16:03:05 +0200

A corpus of spoken Bulgarian, amounting to approx. 50 000 word tokens, is
now available at the address:

These texts represent one half of the corpus that served as the material
for Cvetanka Nikolova: Chestoten rechnik na balgarskata razgovorna rech (A
Frequency Dictionary of Colloquial Bulgarian), Nauka i izkustvo, Sofia
1987. The texts are made available with the kind permission of Cvetanka
Nikolova and through the assistance of Tzvetomira Venkova, who did computer
entry from the original index cards.

At the same address you can also find the _avtoreferat_ of Tzvetomira
Venkova's dissertation: Sastavnite sajuzi s element _da_ ot gledna tochka
na kompjutarnija tekstov analiz (Formalen model i proekt za ekspertna
sistema), Sofia 1997.

--- Kjetil Ra Hauge, U. of Oslo.
--- Tel. +47/22 85 67 10, fax +47/22 85 41 40