Statistics for inter-group comparison on categorization tasks

Ken Litkowski (71520.307@CompuServe.COM)
05 Jul 96 14:23:00 EDT

I just came in at the end of this discussion, so I didn't have the opportunity
to take part in its earlier phases. However, I would like to make some
observations which I hope are relevant.

I have been fortunate enough to have had some exposure to content analysis
methods from a program evaluation perspective in addition to having a
computational linguistics perspective. After listening to one of the papers
cited by Jean Carletta, I mentioned the Krippendorf method as a potentially
useful approach to the classification problem, so I wholeheartedly ascribe to
her call for increased use of such content analysis techniques in NLP.

In addition, I would like to point out that Krippendorf's method is primarily
designed to deal with human interrater reliability, while computerized content
analysis (a problem domain which is still in its nascent stage) can remove this
as an issue. So I would suggest that we in the CL and NLP communities would do
well not only to take notice of this reliability issue from content analysis,
but also take notice of the larger problems facing this field.

Content analysis is primarily involved in categorization of content and we from
the CL and NLP communities should strive to bring our knowledge of semantics to
the goal of developing categories for content analysis. My fortuitous encounter
with a sociologist who has been developing a computer content analysis method
for some 20 years has suggested to me several possibilities for improved
category development that draw upon the notions of semantic fields and the
semantic network embodied in WordNet. Moreover, this specific content analysis
method has strong affinities with the semantic vector approach to information
retrieval developed by Liz Liddy at Syracuse.

I would commend to those interested the paper by Don McTavish available at my
web site, describing these ideas in more concrete detail.

Ken Litkowski TEL.: 301-926-5904
CL Research EMAIL: INTERNET> ken@clres.com,
20239 Lea Pond Place 71520.307@compuserve.com
Gaithersburg, MD 20879-1270 USA Home Page: http://www.clres.com