[Corpora-List] Annotation without lexicons

From: Mark Davies (mdavies@ilstu.edu)
Date: Tue Jan 28 2003 - 11:22:07 MET

  • Next message: Miles Osborne: "Re: [Corpora-List] Annotation without lexicons"

    Corpus annotation is of course usually done with the aid of a lexicon
    containing POS and lemma information. But imagine that you need to tag and
    lemmatize a 1-2 million word corpus of a language for which you do not have
    a lexicon. A variant of this might be the need to annotate a corpus from
    the older stage of a language -- e.g. Middle English or Old Spanish --
    which is related to a modern language for which you do have a lexicon. How
    is this best done?

    I've had to address this issue in creating several different corpora and
    have developed my own approach to the problem, but I'm interested in
    alternate approaches that others might have taken. I realize that this
    might be a FAQ, but any pointers to relevant literature would be
    helpful. Thanks in advance.

    Mark Davies

    ====================================================
    Mark Davies, Associate Professor, Spanish Linguistics
    4300 Foreign Languages, Illinois State University, Normal, IL 61790-4300
    309-438-7975 (voice) / 309-438-8083 (fax)
    http://mdavies.for.ilstu.edu
    ** Historical and dialectal Spanish and Portuguese syntax **
    ** Corpus design and use / Web-database scripting / Distance education **
    =====================================================



    This archive was generated by hypermail 2b29 : Tue Jan 28 2003 - 11:47:56 MET