Corpora: stats summary

From: Marco Antonio Esteves da Rocha (marcor@cce.ufsc.br)
Date: Fri Mar 02 2001 - 01:11:52 MET

  • Next message: Priscilla Rasmussen: "Corpora: 3rd Workshop on Inference in Computational Semantics (ICoS-3) CFP"

    Dear UW PICO(tm) 2.3 File: summary

    Dear all,

    Here goes the summary of free or cheap statistical resources mentioned by
    list members in response to my query:

    *************************************************

    There's a student version of SPSS that is pretty much like the $900 (US)
    version but lacks some of the more advanced statistical tests (e.g.
    loglinear analyses). I've used it and it's quite good. I forget the exact
    price, but it's under $100 (US).

    -Charles Meyer, UMass-Boston

    *************************************************

    >From Cam.Fordyce@lhsl.com Sat Feb 24 20:37:56 2001

    Hi Marco,

    You could look at www.perl.com or any other site that has access to CPAN,
    an archive of modules. There you will find
    the following modules that might be of use.

    Cam

    Here is the listing of some of the statistics-related modules listed at
    the
    above site.

          Math::CDF -- Module
          Math::CDF gives probabilities and quantiles from several statistical
          probability functions, including the normal distribution, t-dist,
          F-dist and others. Non-centrality functions are available for some
          distributions. The module is an interface to the DCDFLIB library of
          C programs. The DCDFLIB source is included with the Math::CDF module
          with permission of its authors.
    Statistics::ChiSquare -- Module
          How random is your data? The Chi Square test tells you.
          Statistics::Descriptive -- Module
          Commonly used statistical methods: mean, variance, standard
          deviation, least squares fit, and so on.
          Statistics::LTU -- Module
          A module for manipulating Linear Threshold Units, also called
          perceptrons, which are neural networks with no hidden layers.
          Statistics::MaxEntropy -- Module
          Object-oriented implementation of Generalised Iterative Scaling
          algorithm, Improved Iterative Scaling algorithm, and Feature
          Induction algorithm for inducing maximum entropy probability
          distributions.
          Statistics::OLS -- Module
          Statistics::OLS (Ordinary Least Squares) computes the estimated
          slope and intercept of the regression line, their T-statistics, R
          squared, standard error of the regression and the Durbin-Watson
          statistic. It can also return the residuals.
          Statistics::ROC -- Module
          Statistics::ROC (receiver-operator-characteristic) determines the
    ROC
          curve and its nonparametric confidence bounds for data categorized
          into two groups. A ROC curve shows the relationship of probability
    of
          false alarm (x-axis) to probability of detection (y-axis) for a
          certain test. Expressed in medical terms: the probability of a
          positive test, given no disease> to the probability of a positive
          test, given disease. The ROC curve may be used to determine an
          optimal cutoff point for the test.

    ****************************************************************

    Hi,
    I certainly wish you the best of luck with your
    project - I think statistical work is the way to go. :)

    I've heard of, but not yet used, a free statistical
    programming package called R. It's a freeware
    counterpart to the very popular S and S-Plus
    stats programming packages. Here's a URL:
     http://www.r-project.org/

    ************************************************************

    >From henning.reetz@uni-konstanz.de Sat Feb 24 20:38:44 2001

    Hi Marco,

    you should take a look at the JMP package - it's from SAS (we pay
    about the equivalent of $50 for our university-related licence; the
    normal price is something like $500 - check whether you can get it
    via a research institution related to you for a lower price. - There
    is also a student's version JMP IN ) and it has a graphics
    user-interface (on Mac and Windows - I don't know about UNIX/LINUX
    versions). It is a general purpose system with lots of graphic
    representation (you can AND/OR graphically), has a very complex ANOVA
    (can handle many more things than SPSS) and it's fast and reliable
    (SPSS runs any ANOVA, JMP barks if there are linear dependencies in
    the data) -- The user-interface is okay, once you mastered the
    sometimes strange concepts (e.g., they use a post-fix language for
    their logical terms) and you can also write scripts.

    I use it for many more things than statistical evaluation, for
    example you can formulate things like "select all 3-syllable words
    from the CELEX database and sort them by the medial syllable" (once
    you have read in the CELEX database).

    The URL is http://www.jmpdiscovery.com/ you can also download a demo

    The only thing I don't know is whether you can get it somehow for a
    price low as $50 - but first take a look whether it would be
    interesting at all. -- It is much easiert to handle than anything
    else and no comparision to the normal SAS package.

    Henning Reetz

    ******************************************************

    From: "TOYOSHIMA,Masayuki" <mtoyo@aa.tufs.ac.jp>

    I have written 3 tests in perl, i.e.
            Chi-square test (table-t.pl)
            T-test (avrg-t.pl)
            test of proportions (ratio-t.pl)
    http://jcs.aa.tufs.ac.jp/mtoyo/stats/stats-pl.zip

    I am sorry to say that all the documentation is in Japanese.
    But the perl scripts themselves are in and perl :-) with comments in
    English.

    *********************************************************

    >From manning@cs.stanford.edu Sat Feb 24 20:43:17 2001

    Marco,

    You could try R, a totally free implementation of the S statistics
    programming language:

            http://www.r-project.org/

    R has everything you need. The main possible disadvantage of R (or S)
    versus packages like SPSS/SAS is that they are much more programming
    languages customized for statisticians rather than statistics
    packages. So, they require more technical competence on the part of
    users.

    ********************************************************

    From: George Foster <foster@IRO.UMontreal.CA>

    Hi,

    Lispstat is good, fun, and free, though not particularly intended for NLP.
    Have a look at:

    http://www.stat.umn.edu/~luke/xls/xlsinfo/xlsinfo.html

    George

    ********************************************************

    From: Paul Clough <p.clough@dcs.shef.ac.uk>

    Hi Marco,

    Have you tried the Perl CPAN pages? A small number of statistical
    functions
    can be found here:

    http://www.perl.com/reference/query.cgi?statistics

    Also, if you just want a free data analysis/statistical package to use,
    have
    you tried R?

    http://cran.r-project.org/doc/manuals/R-intro.pdf

    http://cran.r-project.org/

    Paul.

    *********************************************

    From: Patrick Ruch <ruch@dim.hcuge.ch>

    For all the above needs, we use S-PLUS, they have very nice edu prices,
    about 20$ for students (at least, it is what in costs in the Geneva
    University).
    This is more a matter of marketing, and I do not know if this price is the
    same
    for any University, but you can maybe get in touch with S-PLUS sellers, to
    get comparable prices !

    ********************************************

    From: John Aitchison <jaitchis@lisp.com.au>

    R (also called gnu S) is FREE, runs on a variety of platforms, has a huge
    range of procedures....

    www.r-project.org

    I use it and love it. Forget SPSS and SAS and SPLUS and .. well, you just
    need R

    *************************************************

    From: Mike Scott <lexically@btinternet.com>

    Oi Marco Antonio

    http://uk.torry.net/statistic.htm

    tem componentes Dephi pra estatistica, naturalmente so pra quem programa
    em
    Pascal. Talvez algum seja util... os S sao source included, F free, etc.

    [] Mike Scott

    Mike is saying, in Portuguese, that there are Dephi (perhaps Delphi)
    components for statistics in the address above, naturally for those who
    program in Pascal.
    Those marked S are source included and those marked F free, etc.
    *****************************************************

    From: "Melamed, Dan" <Dan.Melamed@westgroup.com>

    Much of what you need is here:

    http://www.acm.org/~perlman/statinfo.html

    IDM

    *************************************************

    Thanks to all those people who took the time to respond. The R system got
    more votes than any other solution (no hanging chads). I have already
    downloaded it and it seems really good. I will be testing Lispstat soon.
    Other solutions will be looked into next. I decided to post the summary of
    mentioned resources before finishing testing, as this will obviously take
    a long time.

    Cheers,

    Marco



    This archive was generated by hypermail 2b29 : Fri Mar 02 2001 - 21:45:12 MET