Re: Corpora: some language modeling questions

From: Daniel.Walker@lhsl.com
Date: Mon Jul 30 2001 - 22:13:34 MET DST

Next message: E Tjong Kim Sang: "Corpora: CoNLL-2001 proceedings available online"

Previous message: James Cussens: "(no subject)"
Maybe in reply to: F. Peng: "Corpora: some language modeling questions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This is kind of ugly in ASCII with underscores but ...

From "Class-based n-gram models of natural language," (Brown, 1990)
P ( w_k | w_1 ... w_k-1) = P ( w_k | c_w_k ) P ( c_w_k | c_w_1 ... c_w_k-1)

I'm using c_w_k instead of c_k to more strongly indicate that this is the
class of the kth word. So, if you sum over all k words you get

sum for w_k P ( w_k | w_1 ... w_k-1) = sum for w_k P ( w_k | c_w_k ) P (
c_w_k | c_w_1 ... c_w_k-1)

not P(c_w_k|c_w_1 ... c_w_k-1). This will be 1 for you're corpus if you are
using maximum likelihood estimates. Models do not necessarily have to sum
to 1 for some corpus. For example, if you are discounting you're estimates
to leave some probability mass left for phenomena which you may not have
seen in the corpus, the sum may be less than 1. In this case the model
should still some to 1 over the event space, but now you're trying to shoot
for an event space that is larger than the corpus. At least, that's my
understanding. Good luck!

Daniel Walker

                    "F. Peng"
                    <f3peng@logos.math.uwa To: CORPORA@HD.UIB.NO
                    terloo.ca> cc:
                    Sent by: Subject: Corpora: some language modeling questions
                    owner-corpora@lists.ui
                    b.no


                    07/30/2001 06:54 AM

I have some questions about language modeling. For the class-based n-gram
models (Brown et al. 1990), the probability of word w_k given its history
w_1_(k-1) is defined as

Pr(w_k|w_1_(k-1)) = Pr(w_k|c_k)Pr(c_k|c_1_(k-1))

where w_1_(k-1) is the history of work w_k: w_1...w_(k-1),
c_k is the class which word w_k is in,
c_1_(k-1) is the class history of word w_k: c_1...c_(k-1),

Under this definition, the sum of Pr(w_k|w_1_(k-1)) over all w_k
is not equal to 1, it's Pr(c_k|c_1_(k-1)). Isn't it?

Isn't it a necessary condition for a language model to satisfy the
condtion that \sum_w Pr(w|history) = 1?

Maybe it's not a question for you, but it puzzled me for a while. thanks
in advance for help.

Best regards

Fuchun

---------------------------------------------------------
Fuchun Peng
Computer Science Department, University of Waterloo
Waterloo, Ontario, Canada, N2L 3G1
1-519-888-4567 ext 3478
f3peng@ai.uwaterloo.ca
--------------------------------------------------------

Next message: E Tjong Kim Sang: "Corpora: CoNLL-2001 proceedings available online"
Previous message: James Cussens: "(no subject)"
Maybe in reply to: F. Peng: "Corpora: some language modeling questions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Jul 30 2001 - 22:12:56 MET DST