Re: [Corpora-List] Multiword Expressions

From: Lars Aronsson (lars@aronsson.se)
Date: Wed Jan 14 2004 - 14:41:07 MET

  • Next message: Don Hardy: "[Corpora-List] CFP: Computer Studies in Language and Literature Discussion Group (MLA)"

    Anna Korhonen wrote:

    > CALL FOR PAPERS
    > [...]
    > In recent years, there has been a growing awareness in the NLP community
    > of the problems that Multiword Expressions (MWEs) pose and the need for
    > their robust handling.
    >
    > MWEs include a large range of linguistic phenomena, such as phrasal verbs
    > (e.g. "add up"), nominal compounds (e.g. "telephone box"), and
    > institutionalized phrases (e.g. "salt and pepper"). These expressions,
    > which can be syntactically and/or semantically idiosyncratic in nature,
    > are used frequently in everyday language, usually to express precisely
    > ideas and concepts that cannot be compressed into a single word.

    I'm not a linguist, and didn't know there was a word for MWEs until
    now.

    Is there any freely available open source software for spell checking
    (or natural language parsing) that handles multiword expressions? I
    want an algorithm that can approve "nota bene", "ad notam" and "San
    Francisco" (if these MWEs are in the dictionary) in an English text
    without approving the member words on their own.

    Free software such as ispell, aspell, myspell don't seem to have this
    ability. They seem to handle splitting the input text into words
    entirely separately from what's in the dictionary.

    Dictionary-driven parsing could also be useful for abbreviations,
    hyphenation, and SillyCapitalization.

    -- 
      Lars Aronsson (lars@aronsson.se)
      Aronsson Datateknik - http://aronsson.se/
    



    This archive was generated by hypermail 2b29 : Wed Jan 14 2004 - 14:43:27 MET