Re: Corpora: Parsing morphologically rich languages

From: Martin Wynne (martin@clg.bham.ac.uk)
Date: Mon Jan 22 2001 - 12:31:57 MET

  • Next message: Stav Tamy: "Corpora: parser recommendation"

    The EAGLES 'Recommendations for the Morphosyntactic Annotation of
    Corpora' (available at
    http://www.ilc.pi.cnr.it/EAGLES/annotate/annotate.html) provide a
    formalism which can deal with values for multiple morphosyntactic
    categories in a single tag, and also has facilities for dealing with
    underspecification and ambiguity. The tag is a linear string of
    characters, where each character represents a value for a particular
    morphosyntactic feature. For example (from the document cited above):

    - A common noun, feminine, plural, countable, is represented: N122010
    - A 3rd person, singular, finite, indicative, past tense, active, main verb,
    +non-phrasal, non-reflexive, verb is
      represented: V3011141101200
      
    As far as I know, these recommendations were drawn up for and have been
    used with mainly West European languages such as English, French and
    Italian, but it seems to me that they could be usefully applied to more
    morphologically rich inflectional languages,

    Martin

    On Fri, Jan 12, 2001 at 03:18:34PM +0100, "Alexander Mikhailian <mikhailian"@altern.org wrote:
    > Hello,
    >
    > I am looking for references to syntactic parsers
    > that deal with morphologically rich flexive languages.
    >
    > In particular, I am interested in :
    >
    > 1. Approaches to deal with the number of POS tags
    > (terminals) that would supposedly be larger
    > than for English or French, e.g if one tries
    > to build a list of POS tags for a morphologically
    > rich language in order to follow approaches
    > developed for English, this list may easily grow up
    > to thousands of entries which implies that grammars
    > using such a huge list of terminals would be quite
    > complicated.
    >
    > 2. Approaches to deal with the free or loosely
    > restricted order of words that is often proper to
    > morphologically rich languages and which requires
    > different parsing techniques than for English,
    > where a common shift/reduce parser is often sufficient.
    >
    > Thanks in advance,
    >
    > --
    > Alexander Mikahilian
    >
    >
    >

    -- 
    Martin Wynne			Centre for Corpus Research,
    Coordinator, TRACTOR Network	Department of English,
    www.tractor.de			Birmingham University
    Tel: +44 (0)121 414 2763	Birmingham
    Fax: +44 (0)121 414 6053	UK - B15 2TT
    email: martin@clg.bham.ac.uk
    



    This archive was generated by hypermail 2b29 : Mon Jan 22 2001 - 12:26:10 MET