Corpora: Summarization of HTML Documents

Noemi Preissner (noemi@CoLi.Uni-SB.DE)
Wed, 6 Aug 1997 15:52:41 +0200 (MET DST)

Hi,

I would like to automatically summarize HTML documents found
by a search engine given a certain query. I am interested in
two different kinds of summary: a tailored summary which takes
into account the keywords of the query and might, e.g., consist
of all the sentences containing those keywords, and a neutral
summary which should be independent of the query. The second
case obviously is more difficult than the first one, although
I have some intuitions such as listing all the headings (which
should be quite easy to detect in an HTML document ... ) or
determining keywords by taking into account word frequencies
in the document (if a word happens to occur very often in the
document although it's not that widespread in the language in
general, it could be considered as a keyword ... ).

I would like to summarize English, French and German texts,
and I would be very thankful for further suggestions. Also,
I am interested in literature concerning that subject, so
thanks in advance for any hints!

Noemi (noemi@coli.uni-sb.de)

P.S.: Sorry if you receive this email twice, I've accidentally
tried to post it from my other non-member account, which
seems to take a little longer.

----- End of forwarded message from Noemi Preissner -----