Corpora: Fast Transformation-Based Toolkit

From: Radu Florian (rflorian@cs.jhu.edu)
Date: Thu Oct 11 2001 - 23:58:41 MET DST

  • Next message: pernilla@clg2.bham.ac.uk: "Corpora: 6th TELRI Seminar: Registration before October 19th"

    The fnTBL Toolkit
    -----------------

    The Natural Language Processing Group from Johns Hopkins University is
    happy to announce the availability of fnTBL 1.0, a fast implementation
    of Transformation-Based Learning (TBL).

    Transformation-based learning is an error-driven machine learning
    technique which functions by first assigning the most likely class to
    samples, and then iteratively selecting and applying the transformation
    rule which results in the maximum reduction of the error rate.

    The fnTBL toolkit is designed for large, dynamic classification tasks
    like the ones that are common in Natural Language Processing, such as
    part-of-speech tagging, base noun phrase chunking or word sense
    disambiguation, but can be used to perform any classification task
    with symbolic features. fnTBL improves the running time dramatically
    compared with the original TBL algorithm proposed by Eric Brill,
    obtaining a speed-up of up to 2 orders of magnitude, while maintaining
    the same performance.

    Some of the features of the fnTBL toolkit:

    - it supports a large number of symbolic features and feature types
    (including bag-of-words-type features, identity features, subword
    features, prefix/suffix features, etc);

    - it has a flexible architecture, with feature types being easy to
    create, add, remove or modify, which makes the toolkit useful in
    rapidly deploying a classifier for a particular task;

    - new tasks are easy to set-up - a large pool of feature types is
    already implemented and some Perl tools for data processing are provided;

    - basic NLP tasks for English (part-of-speech tagging, base noun
    phrase and text chunking) are already trained and are part of the
    distribution; others (e.g. Swedish part-of-speech) can be downloaded
    from the web site.

    - multitask, simultaneous classification is supported (e.g. learn to
    perform word segmentation together with POS tagging for Chinese).

    - the resulting rules often carry easy-to-understand linguistic content,
    which can offer insight into the the problem's behavior.

    =========
    Download
    =========
    fnTBL version 1.0 is public domain software and can be downloaded from
    the main web site:

        http://nlp.cs.jhu.edu/~rflorian/fntbl/index.html

    When downloading the software, you will be invited to join the fnTBL
    mailing list, at fnTBLtk@nlp.cs.jhu.edu .

    For more information about fnTBL, please refer to the documentation
    at:

       http://nlp.cs.jhu.edu/~rflorian/fntbl/documentation.html

    The documentation can also be downloaded separately as a postscript or
    PDF file from:

       http://nlp.cs.jhu.edu/~rflorian/fntbl/fnTBL-toolkit.ps.gz
    or
       http://nlp.cs.jhu.edu/~rflorian/fntbl/fnTBL-toolkit.pdf.gz

    The software package contains the C++ sources of the program and a
    number of useful Perl scripts, including an almost turn-key solution
    for training and/or testing a POS tagger. A small number of test cases
    and three rule pre-trained systems (English POS tagging, English Base
    NP chunking and English Text Chunking) are also provided. The software
    ise easy to set up on most Unix systems; it has also been tested on a
    Windows(Cygwin) system.

    We hope that the fnTBL toolkit will prove useful to you,

                                    Radu Florian and Grace Ngai
                                    Natural Language Processing Group
                                    Johns Hopkins University



    This archive was generated by hypermail 2b29 : Thu Oct 11 2001 - 23:52:54 MET DST