Corpora: Software release: TiMBL 1.0

Jakub.Zavrel@kub.nl
Fri, 20 Mar 1998 13:28:16 +0100 (MET)

----------------------------------------------------------------------
Software release: TiMBL 1.0
Tilburg Memory Based Learner
ILK Research Group, http://ilk.kub.nl/
----------------------------------------------------------------------

The ILK (Induction of Linguistic Knowledge) Research Group at Tilburg
University, The Netherlands, announces the release of TiMBL, Tilburg
Memory Based Learner (version 1.0).

TiMBL is a machine learning program implementing a family of
Memory-Based Learning techniques for discrete data. TiMBL stores a
representation of the training set explicitly in memory (hence
`Memory Based'), and classifies new cases by extrapolating from the
most similar stored cases.

TiMBL features the following (optional) metrics and speed-up
optimalizations that enhance the underlying k-nearest neighbour
classifier engine:

- Information Gain weighting for dealing with features of differing
importance (the IB1-IG learning algorithm).
- Stanfill & Waltz's / Cost & Salzberg's (Modified) Value Difference
metric for making graded guesses of the match between two
different symbolic values.
- Conversion of the flat instance memory into a decision tree,
and inverted indexing of the instance memory, both yielding
faster classification.
- Further compression and pruning of the decision tree, guided
by feature information gain differences, for an even larger
speed-up (the IGTREE learning algorithm).

TiMBL accepts commandline arguments by which these metrics and
optimalizations can be selected and combined. TiMBL can read the
C4.5 and WEKA's ARFF data file formats as well as column files and
compact (fixed-width delimiter-less) data.

-[download]-----------------------------------------------------------

You are invited to download the TiMBL package for educational or
non-commercial research purposes. When downloading the package you
are asked to register, and express your agreement with the license
terms. TiMBL is *not* shareware or public domain software.

The TiMBL software package can be downloaded from

http://ilk.kub.nl/software.html

or by following the `Software' link under the ILK home page at
http://ilk.kub.nl/ .

The TiMBL package contains the following:

- Source code (C++) with a Makefile.
- A reference guide containing descriptions of the incorporated
algorithms, detailed descriptions of the commandline options,
and a brief hands-on tuturial.
- Some example datasets.
- The text of the licence agreement.
- A postscript version of the paper that describes IGTREE.

The package should be easy to install on most UNIX systems.

-[background]---------------------------------------------------------

Memory-based learning (MBL) has proven to be quite successful in a
large number of tasks in Natural Language Processing (NLP) -- MBL of
NLP tasks (text-to-speech, part-of-speech tagging, chunking, light
parsing) is the main theme of research of the ILK group. At one
point it was decided to build a well-coded and generic tool that
would combine the group's algorithms, favorite optimization tricks,
and interface desiderata, the whole of which is now version 1.0 of
TiMBL.

We think TiMBL can make a useful tool for NLP research, and, for that
matter, for any other domain with discrete classification tasks.

For information on the ILK Research Group, visit our site at

http://ilk.kub.nl/

On this site you can find links to (postscript versions of)
publications relating to the algorithms incorporated in TiMBL and on
their application to NLP tasks.

The reference guide ("TiMBL: Tilburg Memory-Based Learner, version
1.0, Reference Guide.", Walter Daelemans, Jakub Zavrel, Ko van der
Sloot, and Antal van den Bosch. ILK Technical Report 98-03) can be
downloaded separately and directly from

http://ilk.kub.nl/~ilk/papers/ilk9803.ps.gz

For comments and bugreports relating to TiMBL, please send mail to

Timbl@kub.nl

----------------------------------------------------------------------