well, i am not at all surprised that i have got hold of the wrong end
of the stick and that there is a much larger can of worms here.
But stratification does NOT assume equal costs, and there is a ton of
work in the sampling literature about optimum sample design with
differential costs ..
> to make this concrete, suppose that you can get a certain amount of
> usenet news text per day, and that you can get a certain amount of
> newswire text per dollar (pound, ecu...) and that you can get a
> certain much smaller amount of spoken language per dollar and that you
> can digitize some amount of out of copyright literary text per dollar.
>
> now further suppose that you have $100,000 and 100 days to build a
> corpus. how do you allocate your resources to optimally estimate some
> quantity (i.e. the frequency of the word "bank")? and how do you
> allocate your resources to optimally estimate 10^7 parameters (a
> speech recognition language model)?
This problem is too "diffuse" for me, but generally speaking costs
and constraints ARE formally taken into account in sampling
literature .. I don't think you will find them as explicitly
addressed in the experimental design field but the newer optimal
design approaches could certainly be adapted to yield an 'optimum' design within
a cost constraint. (have a look at Nam Nguyen or Dennis Lim's work on
near orthogonal designs, supersaturated designs and so on .. once the
orthogonality requirement is relaxed, lots of designs become
possible).
But I agree .. optimum design for woolly/fuzzy objectives is difficult.
> and finally, given multiple competing goals with specified value
> (political value, mostly), how do you come out smelling like a rose?
if this is seen as a sampling problem/experimental design problem, it
is most certainly addressed in the literature.
>
John Aitchison <jaitchison@acm.org>
Data Sciences Pty Ltd
Sydney, AUSTRALIA.