Corpora: statistics in CL question

From: Alexander S. Yeh (asy@mitre.org)
Date: Tue Mar 28 2000 - 01:49:54 MET DST

Next message: Ted Briscoe: "Corpora: Lectureship in Computational Linguistics / NLP"

Previous message: htakashi@mse.biglobe.ne.jp: "Corpora: Tools to convert HTML files into plain text"
Next in thread: Alexander S. Yeh: "Re: Corpora: statistics in CL question"
Reply: Alexander S. Yeh: "Re: Corpora: statistics in CL question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Recently, I saw the following statement (author is unknown):

>In most studies of z-scores and t-scores in computational linguistics,
>you tend to find that scores are too high. When you compute scores
>for bigrams, for example, you would expect 5% of the scores would be
>greater than 1.65, but you tend to find more than that.

I am trying to find the studies referred to, and what makes some people
believe that the scores are too high. Thank you.

-Alex Yeh (asy@mitre.org)

Next message: Ted Briscoe: "Corpora: Lectureship in Computational Linguistics / NLP"
Previous message: htakashi@mse.biglobe.ne.jp: "Corpora: Tools to convert HTML files into plain text"
Next in thread: Alexander S. Yeh: "Re: Corpora: statistics in CL question"
Reply: Alexander S. Yeh: "Re: Corpora: statistics in CL question"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Mar 28 2000 - 09:13:29 MET DST