Research on Tokenisation

Colin Matheson (colin@cogsci.ed.ac.uk)
Tue, 20 Aug 96 15:54:50 +0100

I'd be very grateful for any pointers to recent work on tokenisation.
By `tokenisation' I mean most kinds of text pre-processing stages in
which 'words' (or whatever) are identified and labelled.

Part of the Language Technology Group at the University of Edinburgh
is about to start work on a general tool, to be made available to the
research community, hence our interest in the current state of the art.

Unless you think your response is of general interest, please reply
directly to the address below.

Colin
--------
Colin Matheson | Human Communication Research Centre
Phone: +44 131 650 4656 | University of Edinburgh
Fax: +44 131 650 4587 | 2 Buccleuch Place
Email: Colin.Matheson@ed.ac.uk | Edinburgh EH8 9LW
WWW: http://www.cogsci.ed.ac.uk/~colin | Scotland