The disagreement between Ted and Adam seems to arise from different
assumptions about what IR stop lists contain. It seems that Adam's
assumption is that stop lists contain (mostly?) closed-class words,
which is indeed the case in some systems, whereas Ted's assumption is
that stoplists contain frequent words. Obviously, not all
closed-class words are frequent and not all frequent words are from
the closed classes.
This is a clear example for the original query-poster of how
task-specific stoplists must be. For IR, you'd want all highly
frequent words in your stoplists, regardless of their part of speech
(e.g. "web" is stoplisted on most Web search engines.) On the other
hand, for author ID, you'd want to look *only* at closed-class words,
as demonstrated in _The Federalist Papers_.
I. Dan Melamed melamed@linc.cis.upenn.edu
University of Pennsylvania http://www.cis.upenn.edu/~melamed/