Re: [Corpora-List] N-gram string extraction

From: Klas Prutz (klas.prytz@ling.uu.se)
Date: Tue Aug 27 2002 - 16:39:45 MET DST

Next message: Stefan Evert: "Re: [Corpora-List] N-gram string extraction"

Previous message: andrius@ccl.bham.ac.uk: "[Corpora-List] N-gram string extraction"
In reply to: andrius@ccl.bham.ac.uk: "[Corpora-List] N-gram string extraction"
Next in thread: Stefan Evert: "Re: [Corpora-List] N-gram string extraction"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

Just one question: what is a significant n-gram?
In realtion to what?

Ragards

Klas Prytz
Institutionen för lingvistik
Uppsala universitet

On Tue, 27 Aug 2002 andrius@ccl.bham.ac.uk wrote:

> Dear list members,
>
> I am currently working on extraction of statistically significant n-gram
> (1<n<6) strings of alpha-numerical characters from a 100 mln character
> corpus, and I intend to apply different significance tests (MI, t-score,
> log-likelihood etc.) on these strings. I'm testing Ted Pedersen's N-gram
> Statistics Package, which seems being able to produce the tasks, however
> it hasn't produced any results after one week of running.
> I have a couple of queries regarding n-gram extraction:
> 1. I'd like to ask if members of the list are aware of similar software
> capable of accomplishing the above mentioned tasks reliably and
> efficiently.
> 2. And a statistical question. As I need to count association scores for
> trigrams, tetragrams, and pentagrams as well, I plan to split them into
> bigrams consisting of a string of words plus one word [n-1]+[1] and
> count association scores for them.
> Does anyone know if this is a right thing to do from a statistical point
> of view?
>
> Thank you,
> Andrius Utka
>
> Research Assistant
> Centre for Corpus Linguistics
> University of Birmingham
>
>

Next message: Stefan Evert: "Re: [Corpora-List] N-gram string extraction"
Previous message: andrius@ccl.bham.ac.uk: "[Corpora-List] N-gram string extraction"
In reply to: andrius@ccl.bham.ac.uk: "[Corpora-List] N-gram string extraction"
Next in thread: Stefan Evert: "Re: [Corpora-List] N-gram string extraction"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Tue Aug 27 2002 - 16:48:02 MET DST