Re: Corpora: efficient string matching

R Chandrasekar (mickeyc@linc.cis.upenn.edu)
Sat, 11 Apr 1998 16:37:19 -0400 (EDT)

> Tom Vanallemeersch writes:
> > What interests
> > me ... is to know whether there a programs which some way store
> > information on the substrings of a text (e.g. all substrings up to a
> > length of 10), and possibly give the context of those strings.
>
> This sounds like an application for a Patricia tree (I think this is the
> name). A good book on algorithms should describe this data structure.
>

Yes, Tom may wish to look up

@book{frakes-yates92,
author = "W. B. Frakes and R. S. Baeza-Yates",
title = "Information Retrieval: Data Structures and Algorithms",
publisher = "Prentice Hall",
year = 1992}

Look for terms such as Pat-trees and Pat-arrays as well.

Regards,

-- Chandrasekar

-- 
Raman Chandrasekar,      CASI/Instt for Research in Cognitive Science, 
Univ of Pennsylvania,3401 Walnut St, Suite 400A, Philadelphia PA 19104
Phone: +1-215-898-0332,  Fax: +1-215-573-9247,   Home: +1-610-352-5512
mickeyc@linc.cis.upenn.edu          http://www.cis.upenn.edu/~mickeyc/