Corpora: some language modeling questions

From: F. Peng (f3peng@logos.math.uwaterloo.ca)
Date: Mon Jul 30 2001 - 15:54:58 MET DST

Next message: James Cussens: "(no subject)"

Previous message: Ruslan Mitkov: "Corpora: RANLP 2001"
Next in thread: Daniel.Walker@lhsl.com: "Re: Corpora: some language modeling questions"
Reply: Daniel.Walker@lhsl.com: "Re: Corpora: some language modeling questions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I have some questions about language modeling. For the class-based n-gram
models (Brown et al. 1990), the probability of word w_k given its history
w_1_(k-1) is defined as

Pr(w_k|w_1_(k-1)) = Pr(w_k|c_k)Pr(c_k|c_1_(k-1))

where w_1_(k-1) is the history of work w_k: w_1...w_(k-1),
c_k is the class which word w_k is in,
c_1_(k-1) is the class history of word w_k: c_1...c_(k-1),

Under this definition, the sum of Pr(w_k|w_1_(k-1)) over all w_k
is not equal to 1, it's Pr(c_k|c_1_(k-1)). Isn't it?

Isn't it a necessary condition for a language model to satisfy the
condtion that \sum_w Pr(w|history) = 1?

Maybe it's not a question for you, but it puzzled me for a while. thanks
in advance for help.

Best regards

Fuchun

---------------------------------------------------------
Fuchun Peng
Computer Science Department, University of Waterloo
Waterloo, Ontario, Canada, N2L 3G1
1-519-888-4567 ext 3478
f3peng@ai.uwaterloo.ca
--------------------------------------------------------

Next message: James Cussens: "(no subject)"
Previous message: Ruslan Mitkov: "Corpora: RANLP 2001"
Next in thread: Daniel.Walker@lhsl.com: "Re: Corpora: some language modeling questions"
Reply: Daniel.Walker@lhsl.com: "Re: Corpora: some language modeling questions"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Mon Jul 30 2001 - 15:49:54 MET DST