Corpora: Summary: automatic thesaurus generation

From: Adam Przepiórkowski (adamp@ipipan.waw.pl)
Date: Sun Jan 27 2002 - 10:55:43 MET

  • Next message: Ugur Cetintemel: "Corpora: ODBASE 2002 --- Call For Papers"

    This a brief summary of responses to my query regarding automatic
    thesaurus generation from large corpora. I am very grateful to Bob
    Krovetz, Johan Hagman, Sara Rydin and Bill Mann for helpful
    suggestions.

    The following people worked or are working on automatic generation of
    meaningful hierarchical thesauri:

    Sharon Caraballo (esp. her recent Ph.D. dissertation available from
      her home page);
    Marti Hearst (a 1992 paper available from Marti Hearst's home page);
    Gregory Grefenstette (I found it more difficult to locate relevant
      papers);
    Johan Hagman (results will be presented at JADT
      http://www.irisa.fr/manifestations/2002/JADT/programme.htm#programme);
    Sara Rydin (started work on this for her Ph.D. thesis).

    Virtually all of the work I located concentrates on automatic
    detection of hyponymy/hypernymy relations on the basis of textual
    clues such as "X, including x, y and z" (this normally implies that x,
    y and z are kinds of X).

    Bill Mann also mentions the the Oingo search engine which, it is
    claimed, actually takes advantage of such techniques.

    Best,

    -- 
    	Adam P.
    



    This archive was generated by hypermail 2b29 : Sun Jan 27 2002 - 11:15:17 MET