Re: Corpora: Corpus Linguistics User Needs

Cyril Belica (
Thu, 30 Jul 98 15:33:15 +0200


I agree with Oliver Mason that a concerted initiative aimed at user needs analysis and co-ordinated design of corpus processing software might be of some interest. I have been managing the COSMAS (Corpus Storage, Management and Access System) project here in Mannheim for seven years. Since 1993, the COSMAS search engine running in Mannheim is servicing some 100 linguists in our institute and several hundred users all over the world, currently providing access to more than 500M running words of German text. We support proximity searches, concordancing, statistical collocation analysis and clustering, dynamic corpus composition, search result caching, morpho-syntactic annotations, stemming, etc. In the next version, we implement concurrent dynamic annotations and include sound. A rudimentary WWW-interface to the current search engine is publicly available at

With COSMAS, we have collected quite a lot of experience with users of large corpora and hope to find partners for further discussion und collaboration. To state it more clearly, to argue WHETHER or NOT is a waste of time for me: is someone out there interested in discussing HOW to do it?

Prajem Vam vsetkym prijemny den,

Cyril Belica
Institut fuer deutsche Sprache