Re: authorship testing

Jeff Adams (jeffa@kurz-ai.com)
Fri, 2 Feb 1996 15:18:38 -0500

> the trick in author id is that you have to look only at the items
> which have nothing to do with content (and everybody assumes that the
> language is constant). this generally requires a considerable amount
> of human involvement in choosing the features of interest.

The one study I read chose a few dozen common content-free words
(a, of, with, to, the, etc.), computed their average number of
occurences in various fixed-length blocks of text, and performed
MANOVA on the results to see whether the differences among the
averages were statistically significant. (At least that's how I
remember it, it's been several years...)

When I read their study (which concluded that the differences in
averages was significant, & hence that multiple authors were
involved, rather than a single author), I was a bit skeptical.
Just as Ted describes, it seems there was a little too much human
involvement in picking the "content-free words" & the blocks of
text. At least, too much for me to be convinced by the study.

Most of all, I suppose, I remain skeptical of the fundamental
underlying assumption that a person has a distinctive and
unchanging "wordprint." There are assumptions made that these
"wordprints" are consistent
1. across time,
2. across genres, and
3. across "voices" or characters.

Have any good, honest tests been done which test, for example:
1. whether what I wrote 10 years ago "matches" what I write now;
2. whether my technical reports have the same "wordprint" as my
letters to my mom have the same "wordprint" as my attempts
at short stories;
3. whether the "wordprints" of different characters, say, in a novel,
all match the "wordprint" of the author;

Further (for example, in the case of studies on Biblical authorship,
like whether Isaiah had 1 or 2 authors), there is the occasional
assumption that wordprints survive translation.

Never mind the techniques for now, are there studies to indicate
that the whole topic is valid in the first place?

Jeff

-- 
Jeff Adams
Language Modeling Scientist
Kurzweil Applied Intelligence
http://www.kurz-ai.com/people/jeffa