Corpora: Book: Word frequency dstributions

From: Jean Veronis (Jean.Veronis@newsup.univ-mrs.fr)
Date: Tue Sep 11 2001 - 14:15:26 MET DST

  • Next message: C.R.Voss: "Corpora: CFP: Special Issue of Machine Translation"

    **** NEW BOOK *** NEW BOOK *** NEW BOOK *** NEW BOOK *** NEW BOOK ****

                            KLUWER ACADEMIC PUBLISHERS
                       TEXT, SPEECH AND LANGUAGE TECHNOLOGY
                                   Volume 18
                  Series editors: Nancy Ide and Jean Véronis

                          WORD FREQUENCY DISTRIBUTIONS
                                      by
                             R. Harald Baayen
                     University of Nijmegen, The Netherlands

    This book is a comprehensive introduction to the statistical analysis of
    word frequency distributions, intended for computational linguists, corpus
    linguists, psycholinguists, and researchers in the field of quantitative
    stylistics. Word frequency distributions are characterized by very large
    numbers of rare words. This property leads to strange phenomena such as
    mean frequencies that systematically change as the number of observations
    is increased, relative frequencies that even in large samples are not fully
    reliable estimators of population probabilities, and model parameters that
    vary with text or corpus size. Special statistical techniques for the
    analysis of distributions with large numbers of rare events can be found in
    various technical journals. The aim of this book is to make these
    techniques more accessible for non-specialists, both theoretically, by
    means of a careful introduction to the underlying probabilistic and
    statistical concepts, and practically, by providing a program library
    implementing the main models for word frequency distributions (CD-ROM
    included).

    Kluwer Academic Publishers, Dordrecht
    Hardbound, ISBN 0-7923-7017-1
    June 2001, 356 pp.
    EUR 117.00 / USD 108.00 / GBP 74.00

    ---------------------------------------------------------------------

    CONTENTS

    1. Word Frequencies.

    2. Non-parametric models.

    3. Parametric models.

    4. Mixture distributions.

    5. The Randomness Assumption.

    6. Examples of Applications.

    A. List of Symbols.

    B. Solutions of the exercises.

    C. Software.

    D. Data sets.

    Bibliography.

    Index.

    CD-ROM Included

    ---------------------------------------------------------------------

                                PREVIOUS VOLUMES

        Volume 1: Recent Advances in Parsing Technology
                   Harry Bunt, Masaru Tomita (Eds.)
                   Hardbound, ISBN 0-7923-4152-X, 1996

        Volume 2: Corpus-Based Methods in Language and Speech Processing
                   Steve Young, Gerrit Bloothooft (Eds.)
                   Hardbound, ISBN 0-7923-4463-4, 1997

        Volume 3: An introduction to text-to-speech synthesis
                   Thierry Dutoit
                   Hardbound, ISBN 0-7923-4498-7, 1997

        Volume 4: Exploring textual data
                   Ludovic Lebart, André Salem and Lisette Berry
                   Hardbound, ISBN 0-7923-4840-0, December 1997

        Volume 5: Time Map Phonology:
                   Finite State Models and Event Logics in Speech
                   Recognition
                   Julie Carson-Berndsen
                   Hardbound, ISBN 0-7923-4883-4, 1997

        Volume 6: Predicative Forms in Natural Language and in
                   Lexical Knowledge Bases
                   Patrick Saint-Dizier (Ed.)
                   Hardbound, ISBN 0-7923-5499-0, December 1998

        Volume 7: Natural Language Information Retrieval
                   Tomek Strzalkowski (Ed.)
                   Hardbound, ISBN 0-7923-5685-3, April 1999

        Volume 8: Techniques in Speech Acoustics
                   Jonathan Harrington, Steve Cassidy
                   Hardbound, ISBN 0-7923-5731-0, July 1999

        Volume 9: Syntactic Wordclass Tagging
                   Hans van Halteren (Ed.)
                   Hardbound, ISBN 0-7923-5896-1, August 1999

        Volume 10: Breadth and Depth of Semantic Lexicons
                   Viegas, E. (Ed.)
                   Hardbound, ISBN 0-7923-6039-7, November 1999

        Volume 11: Natural Language Processing Using Very Large Corpora
                   Armstrong, S., Church, K.W., Isabelle, P.,
                   Manzi, S., Tzoukermann, E., Yarowsky, D. (Eds.)
                   Hardbound, ISBN 0-7923-6055-9, November 1999

        Volume 12: Lexicon Development for Speech and Language Processing
                   Frank van Eynde & Dafydd Gibbon (Eds.)
                   Hardbound, ISBN 0-7923-6368-X, April 2000.

        Volume 13: Parallel text processing:
                   Alignment and use of translation corpora
                   Jean Véronis (Ed.)
                   Hardbound, ISBN 0-7923-6546-1, August 2000.

        Volume 14: Prosody: theory and experiment
                   Studies Presented to Gösta Bruce
                   Merle Horne (Ed.)
                   Hardbound, ISBN 0-7923-6579-8, August 2000.

        Volume 15: Intonation : Analysis, Modelling and Technology
                   Antonis Botinis (Ed.)
                   Hardbound, ISBN 0-7923-6605-0, October 2000.
                   Paperback, ISBN 0-7923-6723-5, October 2000.

        Volume 16: Advances in probabilistic and other parsing technologies
                   Harry Bunt, Anton Nijholt (Eds.)
                   Hardbound, ISBN 0-7923-6616-6, October 2000.

        Volume 17: Robustness in language and speech technology
                   Jean-Claude Junqua, Gertjan van Noord (Eds.)
                   Hardbound, ISBN 0-7923-6790-1, February 2001

    Check the series Web page for order information:

        http://www.wkap.nl/series.htm/TLTB



    This archive was generated by hypermail 2b29 : Tue Sep 11 2001 - 14:13:08 MET DST