Re: [Corpora-List] automatic search for orthographic recurring patterns

From: Shlomo Argamon (argamon@iit.edu)
Date: Wed Dec 08 2004 - 16:42:00 MET

  • Next message: Mark Davies: "RE: [Corpora-List] distribution of types of lexical collocations"

    See our paper in COLING-04:

    Shlomo Argamon, Navot Akiva, Amihood Amir, and Oren Kapah.
    Efficient Unsupervised Recursive Word Segmentation Using Minimum
    Description Length.
    Proceedings of The 20th International Conference on Computational
    Linguistics (COLING), August 2004.

    Available at http://lingcog.iit.edu/pub.xml

            -Shlomo-

    MARC FRYD wrote:
    > Hi,
    > Perhaps someone on the List will be able to help me with the following
    > datamining problem:
    >
    > Given a corpus of isolated lexical units or collocations, I would like
    > to determine recurring orthographic patterns whether initial, i.e.
    > "CARPO" (carpogenic, carpogenous, carpolite), final i.e. "IONALISM"
    > (sensationalism, functionalism, etc.) , or internal, i.e. "CHRON"
    > (synchony, synchronize, etc.).
    > The output should be arranged so as to show respective productivity for
    > each pattern.
    > Important constraint: the various patterns will *not* be fed in
    > initially but should be extracted as a result of the algorithm.
    > I'll post a summary if I get several replies.
    > Regards to all list members.
    > Marc Fryd
    >



    This archive was generated by hypermail 2b29 : Wed Dec 08 2004 - 17:50:29 MET