Re: Kolhapur Corpus

Jane A. Edwards (edwards@cogsci.Berkeley.EDU)
Sat, 18 Feb 1995 13:35:38 -0800

I can't tell quite how much detail you're looking for, but am
appending what I know about the Kolhapur, as a starting point with
references from Shastri.
The following is excerpted from the electronic version of my
corpus survey in Edwards/Lampert (eds). (1993).
available via anonymous ftp from the ICAME archive in Bergen at
nora.hd.uib.no and also, I think, via World Wide Web
(http://www.hd.uib.no/).

Best Wishes,
-Jane Edwards
---------------------------
8. The Kolhapur Corpus of Indian English (Shastri, 1985, 1988)
contains 1 million words of written Indian English from the year 1978.
Its texts were selected from the same text categories as the Brown
Corpus and is available from ICAME.

---------------------------
Some information on the Brown corpus:
1. The Brown Corpus (The Standard Corpus of Present-Day Edited
American English) (Francis, 1982; Francis & Kucera, 1979, 1982; Kucera,
1992; Kucera & Francis, 1967) is a corpus of 1 million words of written
American English printed in the year 1961. It was the first corpus to
be put on computer medium and is the most analyzed corpus of English to
date. It consists of 500 written American English texts of 2,000 words
apiece, selected to represent diverse genres of written American
language. There are two main sections: Informative Prose and
Imaginative Prose. Genres represented include newspaper reportage,
press editorials, memoirs, religion, science fiction, detective
fiction, and romance novels (excluding drama and fiction with more than
50% dialog). This corpus of running text is available for academic
research for the cost of materials from both the Oxford Text Archive
and the ICAME archive and is contained on the ICAME CD-ROM available
through NCCH (see above).
[...]
---------------------------
References for the Kolhapur:

Shastri, S. V. (1985). A computer corpus of present-day Indian English: A
preliminary report. ICAME Journal, 9, 9-10.
Shastri, S. V. (1988). The Kolhapur Corpus of Indian English and work
done on its basis so far. ICAME Journal, 12, 15-26.

---------------------------