Corpora: Advances in Minority Language NLP / MT Applications

From: MIT2USA@aol.com
Date: Sun May 14 2000 - 00:58:49 MET DST

  • Next message: Maritza vd Heuvel: "Corpora: Html Concordancing"

    MIT2 software solutions for preparing Creole languages for porting
    to popular off-the-shelf computer applications and embedded MT
    systems demonstrated in Seattle

    Marilyn Mason, CEO of Mason Integrated Technologies Ltd (MIT2),
    demonstrated a research prototype of its proprietary orthography
    conversion software for sparse data languages at both the 3rd
    International Controlled Language Applications Workshop
    (CLAW2000) and the Language Technology Joint Conference for
    Applied Natural Language Processing and the North American
    Chapter of the Association for Computational Linguistics
    (ANLP-NAACL2000), held in Seattle, WA April 29 to May 4, 2000.

    The Creole version of this conversion software is being prepared for
    market as CreoleConvert(tm). Paired with CreoleScan(tm), MIT2's
    roprietary optical character recognition (OCR) solution, these tools
    serve as a prototype for an electronic corpus entry and corpus
    cleansing workflow process for languages having a large incidence
    of lexical and orthographical variation.
     
    These processes constitute an essential "middleware" task for
    preparing sparse data / minority languages for porting to other
    language technology tools, such as spell checkers, machine
    translators, speech-to-text and text-to-speech applications, etc.

    MIT2 intends to act as a coordinating agent to enable linguists,
    end user native speakers, language development leaders, and
    corpora builders to coordinate their activites and to conform to
    established protocols, formats and conventions for data tagging,
    so that these precious electronic materials can not only meet
    the short-term goal of providing for a standardized literature
    base, but be re-used to serve as the very building blocks for
    development of future language technology tools for these
    languages.

    As orthographic and lexical standardization are the base elements
    for spell checking, authoring, and translation tools, this technology
    is now being further developed in-house by MIT2 in order to provide
    minority languages with consistent and coherent standardization
    strategies for the optimization of authoring and translation tasks.

    This novel "middleware" approach to porting languages which have
    thus far "missed out on most of the benefits of the Electronic Age"
    stirred considerable interest among representatives of some of the
    biggest players in NLP and MT systems development, who were
    also in attendance at CLAW2000 and ANLP-NAACL2000.

    These processes will be further described and demonstrated at the
    2nd international Language Resources and Evaluation Conference
    (LREC2000) and the LREC2000 Workshop on "Developing language
    resources for minority languages: re-useability and strategic
    priorities" to take place 29 May - 2 June 2000 in Athens, Greece.
    Ms. Mason will deliver the papers "Issues from corpus analysis that
    have influenced the on-going development of various Haitian Creole
    text- and speech-based NLP systems" and "The State of the Art of
    French Creole Language Resource Engineering".

    Located in Boston, Massachusetts (USA), MIT2 fosters research
    and development activity on behalf of French-, Portuguese-, and
    English-related Creoles, as well as other minority and vernacular
    languages, and is actively seeking corporate investment capital
    and corporate strategic partnering relationships.
     
    For more information, please contact:
    Mason Integrated Technologies Ltd (MIT2)
    P.O. Box 181015, Boston, Massachusetts 02118 USA
    Tel: (+1) 617-247-8885, Fax: (+1) 617-262-8923
    E-mail: mit2usa@aol.com
    MIT2 Web Page: http://hometown.aol.com/mit2usa/Index2.html

    *******
    Mason Integrated Technologies Ltd
    P.O. Box 181015
    Boston, MA 02118 USA
    (617) 247-8885 (office & answering machine)
    (617) 262-8923 (FAX)
    MIT2USA@aol.com (e-mail)
    Mason Integrated Technologies Ltd Home Page:
       http://hometown.aol.com/mit2usa/Index2.html
    MIT2 President's Update:
        http://hometown.aol.com/mit2usa/Update3-2000.htm
    Introducing CreoleScan(tm) and CreoleConvert(tm):
        http://hometown.aol.com/mit2usa/IntroCrScCrConv.htm
    Orthographically Converted HC Texts Download Site:
       http://hometown.aol.com/mit2haiti/Index4.html
    Meet Marilyn Mason:
       http://hometown.aol.com/marilinc/Index3.html
    MIT2 Job Opportunities
       http://hometown.aol.com/mit2usa/JobOpps.html



    This archive was generated by hypermail 2b29 : Sun May 14 2000 - 00:58:21 MET DST