Re: [Corpora-List] corpus ------>>>>> thesaurus

From: Paul Buitelaar (paulb@dfki.de)
Date: Tue Nov 16 2004 - 12:54:00 MET

  • Next message: Claudia Sassen: "[Corpora-List] 1st CfP: Dialogue Modelling and Generation"

    Dear Vladimir and all,

    The acquisition of a domain thesaurus from a domain-specific corpus (or
    more in general: text collection) is very much related to current work
    on ontology learning/extraction from text.

    An ontology (as used currently in the Semantic Web context and based on
    previous incarnations in the context of expert systems and similar)
    represents 'a set of concepts and relations between these concepts that
    are relevant to a particular domain of discourse'. Similarly, a
    thesaurus for a particular domain represents a set of terms and a
    selected set of relations between these terms (e.g. 'broader term',
    'narrower term') -- but notice the difference in 'term' vs. 'concept'.
    There is currently much discussion on the status of thesauri in the
    Semantic Web context, e.g. follow developments on SKOS ('an RDF
    vocabulary for describing thesauri, glossaries, taxonomies,
    terminologies'): http://www.w3.org/2004/02/skos/

    As mentioned, there is currently much related work to your question in
    the context of ontology learning/extraction from text. For an overview
    of some recent papers, check out the recent ECAI 2004 workshop on
    "Ontology Learning and Population" at:

    http://olp.dfki.de/ecai04/cfp.htm -- all papers and most presentations
    can be downloaded

    The workshop description also has some further links to previous,
    related workshops.

    Hope this helps,

        Paul Buitelaar
        DFKI - Language Technology &
        Competence Center Semantic Web
        Saarbruecken, Germany

        http://www.dfki.de/~paulb/

    >> I would be very grateful to anyone for any info concerning
    >>
    >>
    >compiling thesaurus from corpus (esp. from corpus of specific domain
    >documents).
    >
    >
    >> As example - thesaurus of financial terms compiled from financial
    >>
    >>
    >documents corpus.
    >
    >
    >> Best wishes to all our corpus society !
    >>
    >>--
    >> Regards Vladimir Rykov
    >>
    >>PhD in Computational Linguistics
    >>Personal web-site: rykov.narod.ru
    >>mailto: rykov2000@mail.ru
    >>Si etiam omnes - ego non
    >>English version: www.blkbox.com/~gigawatt/rykov.html
    >>
    >>--
    >>Яндекс.Игрушки - яркий перерыв в серых трудовых буднях. http://play.yandex.ru/
    >>
    >>
    >>
    >>
    >>
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2b29 : Tue Nov 16 2004 - 17:13:01 MET