Corpora: Sum: closed class word list

From: Diego Molla (
Date: Thu Jun 06 2002 - 04:34:27 MET DST

  • Next message: Pernilla Danielsson: "Corpora: CALL FOR PAPERS: 7th TELRI SEMINAR in DUBROVNIK, September 2002"

    A few days ago I asked whether there is any list of closed class words
    available. Thank you for all the responses that I received, here is a
    brief summary.

    First of all, some respondents said that there is no clear definition
    about what is a closed class word. For example, several people suggested
    to use a list of stop words.

    My student is going to localise WordNet to the domain of software
    documentation manuals, and one step in this process is the addition of
    words from our corpus that are not defined in WordNet. Since WordNet
    contains nouns, verbs, adjectives, and adverbs, he needs to find a way
    to filter out those words that belong to other parts of speech.

    So, for our application, closed classes are parts of speech other than
    nouns, verbs, adjectives, and adverbs.

    A way to find these words is to take a list of words annotated with
    their part of speech, and select those that are not nouns, verbs,
    adjectives, and adverbs. Fuchung Peng did something like that, and he
    sent me a list of words tagged as DT, CC, PRP, PRP$, TO, WDT, WP$, WRB,
    WP in the Brown corpus. Thank you for the list, I'll probably give it to
      my student. Those who are interested in the list can contact me and
    I'll send it to them by email.

    Best regards to all and again, thank you to all who replied to my message.



    This message is intended for the addressee named and may contain confidential information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of Macquarie University.

    --------------------------------------------------------------------- Diego MOLLA ALIOD Department of Computing Macquarie University

    This archive was generated by hypermail 2b29 : Thu Jun 06 2002 - 04:47:44 MET DST