[Corpora-List] topic identification literature

From: Laurel S Stvan (STVAN@uta.edu)
Date: Sun Jul 07 2002 - 20:26:00 MET DST

  • Next message: Ronald Reck: "[Corpora-List] string frequency reports for Project Gutenberg texts"

    Dear fellow researchers,

    A colleague and I are working on a project concerning topic identification.
    He's more computational and I'm more linguistic, so at first we had to
    negotiate what we meant by topic. Essentially, we are looking at ways to
    abstract a given web page to see if it matches a particular topic. We'll
    have access to POS tags, frequency info, HTML code, and WordNet info.
    Here's my question: Is there a widely accepted way to use these pieces of
    info to identify the topic of pictures on a page, or do people each cobble
    together their own identification techniques?

    I'm familiar with Hovy and Lin 1999 and the material on the ACM SIG IR site,
    but I'm curious if there is any linguistic literature that is a touchstone
    on web document topic identification. Leads to any relevant literature would
    be appreciated. I'll be happy to post a summary.

    Thanks.

    ---------------------------------------------
    Laurel Smith Stvan
    Assistant Professor
    Program in Linguistics
    University of Texas at Arlington
    http://ling.uta.edu/~laurel
    stvan@uta.edu
    ---------------------------------------------



    This archive was generated by hypermail 2b29 : Sun Jul 07 2002 - 20:32:36 MET DST