Re: Corpora: some language modeling questions

From: Daniel.Walker@lhsl.com
Date: Mon Jul 30 2001 - 22:13:34 MET DST

  • Next message: E Tjong Kim Sang: "Corpora: CoNLL-2001 proceedings available online"

    This is kind of ugly in ASCII with underscores but ...

    From "Class-based n-gram models of natural language," (Brown, 1990)
    P ( w_k | w_1 ... w_k-1) = P ( w_k | c_w_k ) P ( c_w_k | c_w_1 ... c_w_k-1)

    I'm using c_w_k instead of c_k to more strongly indicate that this is the
    class of the kth word. So, if you sum over all k words you get

    sum for w_k P ( w_k | w_1 ... w_k-1) = sum for w_k P ( w_k | c_w_k ) P (
    c_w_k | c_w_1 ... c_w_k-1)

    not P(c_w_k|c_w_1 ... c_w_k-1). This will be 1 for you're corpus if you are
    using maximum likelihood estimates. Models do not necessarily have to sum
    to 1 for some corpus. For example, if you are discounting you're estimates
    to leave some probability mass left for phenomena which you may not have
    seen in the corpus, the sum may be less than 1. In this case the model
    should still some to 1 over the event space, but now you're trying to shoot
    for an event space that is larger than the corpus. At least, that's my
    understanding. Good luck!

    Daniel Walker

                                                                                                                                
                        "F. Peng"
                        <f3peng@logos.math.uwa To: CORPORA@HD.UIB.NO
                        terloo.ca> cc:
                        Sent by: Subject: Corpora: some language modeling questions
                        owner-corpora@lists.ui
                        b.no
                                                                                                                                
                                                                                                                                
                        07/30/2001 06:54 AM
                                                                                                                                
                                                                                                                                

    I have some questions about language modeling. For the class-based n-gram
    models (Brown et al. 1990), the probability of word w_k given its history
    w_1_(k-1) is defined as

    Pr(w_k|w_1_(k-1)) = Pr(w_k|c_k)Pr(c_k|c_1_(k-1))

    where w_1_(k-1) is the history of work w_k: w_1...w_(k-1),
    c_k is the class which word w_k is in,
    c_1_(k-1) is the class history of word w_k: c_1...c_(k-1),

    Under this definition, the sum of Pr(w_k|w_1_(k-1)) over all w_k
    is not equal to 1, it's Pr(c_k|c_1_(k-1)). Isn't it?

    Isn't it a necessary condition for a language model to satisfy the
    condtion that \sum_w Pr(w|history) = 1?

    Maybe it's not a question for you, but it puzzled me for a while. thanks
    in advance for help.

    Best regards

    Fuchun

    ---------------------------------------------------------
     Fuchun Peng
     Computer Science Department, University of Waterloo
     Waterloo, Ontario, Canada, N2L 3G1
     1-519-888-4567 ext 3478
     f3peng@ai.uwaterloo.ca
     --------------------------------------------------------



    This archive was generated by hypermail 2b29 : Mon Jul 30 2001 - 22:12:56 MET DST