RE: [Corpora-List] corpus homogeneity

From: A.DeRoeck (A.Deroeck@open.ac.uk)
Date: Tue Sep 14 2004 - 15:05:41 MET DST

  • Next message: Luisa Bentivogli: "[Corpora-List] MEANING-05 First announcement and call for papers"

    And we've done some further work, using Adam's as a starting point, in

    A. De Roeck, A. Sarkar and P. Garthwaite. "Frequent Term Distribution
    Measures for Dataset Profiling". Proceedings of LREC, pp 1647- 1651.
    Lisbon. Longer description of the work also available as a technical
    report:

    Technical Report Number 2004/07
    Title: Defeating the Homogeneity Assumption: some findings on the
    distribution of very frequent terms
    Author(s): A. De Roeck, A. Sarkar, P. Garthwaite

    Here
    http://computing-reports.open.ac.uk/index.php/2004/200407

    Anne

    > -----Original Message-----
    > From: owner-corpora@lists.uib.no
    > [mailto:owner-corpora@lists.uib.no] On Behalf Of Adam Kilgarriff
    > Sent: 13 September 2004 17:18
    > To: 'Cormac O'Brien'; corpora
    > Subject: RE: [Corpora-List] corpus homogeneity
    >
    >
    > Cormac,
    >
    > No software to offer, but an easy-to-implement measure
    > is defined in my "Comparing Corpora", Int Jnl of Corpus
    > Linguistics, 6 (1) 2001 Pp 1-37, also ITRI-01-15 available at
    > http://www.itri.brighton.ac.uk/techreports/
    >
    > Adam
    >
    >
    > -----Original Message-----
    > From: owner-corpora@lists.uib.no
    > [mailto:owner-corpora@lists.uib.no] On Behalf Of Cormac
    > O'Brien
    > Sent: 07 September 2004 09:50
    > To: corpora@hd.uib.no
    > Subject: [Corpora-List] corpus homogeneity
    >
    >
    > Hi,
    >
    > Does anyone have a program for testing corpus homogeneity?
    > I'd be very grateful.
    >
    > Cormac
    >
    > -----------------------------------------
    > Cormac O'Brien
    > Postgraduate Student (M.Sc. by research)
    > Computational Linguistics Group
    > Trinity College, Dublin
    >
    > Tel: 00353 1 608 2866
    >
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2b29 : Tue Sep 14 2004 - 15:13:04 MET DST