Re: word frequency lists?

Mark Johnson (Mark.Johnson@xerox.fr)
Fri, 24 Nov 1995 10:11:56 +0100

>
> ... the act of balancing has created a corpus
> which is utterly unlike any real bit of text. thus the frequency
> counts taken from such a balanced corpus cannot be taken as a
> characterization of any real text. these counts may, perhaps, be used
> to highlight the deviations in a particular sample, but even this use
> is subject to serious error.
>
> the moral?
>
> do your own counts on material appropriate to the task at hand!
>
>

Does this mean that we can't find useful ``robust'' statistical
generalizations which hold over most domains?

What implications does this have for broad-coverage parsing? Should
we be looking for systems that try to automatically adapt to (i.e.,
learn) the domain they are given to parse?

Mark Johnson
(on sabbatical from Brown)

---------------------------------------------------------------------
Rank Xerox Research Centre Tel: (33) 76 61 50 37
6, chemin de Maupertuis (33) 76 61 50 50
F38240 Meylan
FRANCE Fax: (33) 76 61 50 99
---------------------------------------------------------------------