Re: [Corpora-List] Sentence ambiguator/splitter summary

From: Staffan Hermansson (shend00@student.vxu.se)
Date: Thu Jan 29 2004 - 21:26:02 MET

  • Next message: Laura Bright: "[Corpora-List] CFP: CoopIS 2004"

    Hello people. Here's a brief summary of the things I've recieved. Some
    people were nice enough to attach documents. I've located most of those
    on the web for you.

    Again, thank you for your support.

    //Staffan

    Applications:
    A free CPAN Perl module for sentence splitting.
    http://listserv.linguistlist.org/cgi-bin/wa?A2=ind0302&L=corpora&P=R5743

    Shlomo Yona maintains another perl-based sentence splitter.
    http://cs.haifa.ac.il/~shlomo/

    Earlier posts on this list (might have missed some):
    http://helmer.aksis.uib.no/corpora/1998-4/0026.html
    http://helmer.aksis.uib.no/corpora/1999-3/0347.html
    http://helmer.aksis.uib.no/corpora/2000-2/0225.html
    http://helmer.aksis.uib.no/corpora/2003-1/0140.html

    Reports:

    Ghassan Mourad was nice and attached the following to me. Though I can't
    read a word in French (thanks anyway), it might still be of interrest.

    Ghassan Mourad (1999)
    La segmentation de textes par l'étude de la ponctuation
    http://www.lalic.paris4.sorbonne.fr/articles/1998-1999/Mourad/CIDE99.pdf

    Ghassan Mourad
    La segmentation de textes par exploration contextuelle automatique,
    présentation du module SegATex
    Ghassan.Mourad@paris4.sorbonne.fr

    Greg Grefenstette and Past Tapanainen. "What is a word, what is a
    sentence? Problems of tokenization."
    http://citeseer.nj.nec.com/grefenstette94what.html

    Tibor Kiss and Jan Strunk
    Scaled log likelihood ratios for the detection of abbreviations in text
    corpora
    http://www.linguistics.rub.de/~kiss/publications/abbrev.pdf

    Tibor Kiss and Jan Strunk
    Multilingual Least-Effort Sentence Boundary Disambiguation
    http://www.linguistics.rub.de/~kiss/publications/publications.html#boundaries

    Andrei Mikheev. "Text Segmentation." In R. Mitkov (ed.) Oxford Handbook
    of Computational Linguistics, OUP, 2003.

    Andrei Mikheev
    Tagging Sentence Boundaries (2000)
    http://citeseer.nj.nec.com/mikheev00tagging.html

    Andrei Mikheev
    Periods, Capitalized Words, etc (1999)
    http://citeseer.nj.nec.com/mikheev99periods.html

    David D. Palmer (2000)
    Tokenisation and Sentence Segmentation,
    Robert Dale, Hermann Moisl and Harold Somers (Eds)
    in A Handbook of Natural Language Processing, Marcel Dekker

    David D. Palmer and Marti A. Hearst,
    Adaptive Multilingual Sentence Boundary Disambiguation
    citeseer.nj.nec.com/palmer97adaptive.html

    J. Reynar and A. Ratnaparkhi,
    A Maximum Entropy Approach to Identifying Sentence Boundaries
    citeseer.nj.nec.com/article/reynar97maximum.html

    -- 
    



    This archive was generated by hypermail 2b29 : Thu Jan 29 2004 - 21:30:41 MET