Dear list members, maybe anybody knows about so called "TASA corpus":
It contains 10 million words of UNMARKED high-school level English text on
Language arts, Health, Home economics, Industrial arts, Science, Social studies, and Business.
Divided into 37,600 text samples, or contexts, or "documents"
(average of 166 words/document).
If the corpus is commercial - then who is owner and the terms of getting it.
The refs I know -
http://www.rni.org/kanerva/cogsci2k-poster.txt
http://lsa.colorado.edu/spaces.html
-- Regards Vladimir RykovPhD in Computational Linguistics Personal web-site: rykov.narod.ru mailto: rykov2000@mail.ru Si etiam omnes - ego non English version: www.blkbox.com/~gigawatt/rykov.html
-- Яндекс.Почта: объем почтового ящика неограничен! (http://mail.yandex.ru/monitoring/)
This archive was generated by hypermail 2b29 : Fri Apr 30 2004 - 14:46:17 MET DST