I am doing some lexical research and would like a public-domain
corpus of at least 100 million words of contemporary English (mixed
sources, US and British English preferably) with a large proportion
of transcribed spontaneous speech. I really need a stochastic
tagging program too along with some software for doing accurate
dependency-based head phrase parsing so that I can carry out my
studies of pronominal reference in cleft and pseudo-cleft
constructions. Actually if anyone knows where I can FTP for free
software to do automatic resolution of pronominal reference (mainly
anaphoric) I'd be really grateful for the full list of contact
sites, etc. A friend of mine told me that there is a public-domain
free dictionary of modern English available on the net which would
help in my analysis so if any of you corpusists out there know where
I can get such a m/r dictionary that would be very helpful too.
Thanks in advance!