Corpora: JHU CLSP Workshop 2000 - Summer Internships

From: Amy Berdann (berdann@jhu.edu)
Date: Thu Jan 27 2000 - 17:16:04 MET

  • Next message: LDC Office: "Corpora: SWITCHBOARD-1 Price Reduction"

    Dear Colleague:

    The Center for Language and Speech Processing at the Johns Hopkins
    University is offering a unique summer internship opportunity, which
    we would like you to bring to the attention of your best students in
    the current junior class.

    This internship is unique in the sense that the selected students will
    participate in cutting edge research as full members alongside leading
    scientists from industry, academia, and the government. The exciting
    nature of the internship is the exposure of the undergraduate students
    to the emerging fields of language engineering, such as automatic
    speech recognition (ASR). natural language processing (NLP), machine
    translation (MT), and speech synthesis (ITS).

    We are specifically looking to attract new talent into the field and,
    as such, do not require the students to have prior knowledge of
    language engineering technology. Please take a few moments to
    nominate suitable bright students who may be interested in this
    internship. On-line applications for the program can be found at
    http://www.clsp.jhu.edu/workshops along with additional information
    regarding plans for the 2000 Workshop and information on past
    workshops. The application deadline is January 28, 2000.

    If you have questions, please contact us by phone (410-516-7730),
    e-mail
    (sec@clsp.jhu.edu) or via the Internet (http://www.clsp.jhu.edu).

                                            Sincerely,

            
                                            Frederick Jelinek
                                            J.S. Smith Professor and Director

    Project Descriptions3

    1. Reading Comprehension

    Building a computer system that can acquire information by reading
    texts has been a long standing goal of computer science. Consider
    designing a computer system that can take the following third grade
    reading comprehension exam.

      How Maple Syrup is Made
      Maple syrup comes from sugar maple trees. At one time, maple syrup
    was used to make sugar. This is why the tree is called a "sugar"
    maple tree. Sugar maple trees make sap. Farmers collect the sap.
    The best time to collect sap is in February and March. The nights
    must be cold and the days warm. The farmer drills a few small holes
    in each
    tree. He puts a spout in each hole. Then he hangs a bucket on the
    end of each spout.
    The bucket has a cover to keep rain and snow out. The sap drips into
    the bucket. About 10 gallons of sap come from each hole.

      1. Who collects maple sap? (Farmers)
      2. What does the farmer hang from a spout? (A bucket)
      3. When is sap collected? (February and March)
      4. Where does the maple sap come from? (Sugar maple trees)
      5. Why is the bucket covered? (to keep rain and snow out)

    Such exams measure understanding by asking a variety of questions.
    Different types of questions probe different aspects of understanding.

    Existing techniques currently earn roughly a 40% grade; still failing
    but encouraging. We will investigate methods by which a computer can
    understand the text better, and hope that by the end of the workshop
    the computer will be ready to move on to the fourth grade!

    2. Mandarin-English Information (MEI)

    Our globally interconnected world increasingly demands technologies to
    support on-demand retrieval of relevant information in any medium and
    in any language. If we search the web for, say, the loss of life in
    an earthquake in Turkey, by entering keywords in English, the most
    relevant stories are likely to be in Turkish or even Greek.
    Furthermore, the latest information may be in the form of audio files
    of the evening's
    news. One would like to be able to firstly find such information and
    then to translate it to English. Finding such information is beyond
    the capabilities of most commercially available search engines; good
    automatic translation is even harder. In this project, we will extend
    the state-of-the-art for searching audio and on-line text in one
    language for a user who speaks another language.

    A very large corpus of concurrent Mandarin and English textual and
    spoken news stories is available for conducting such research. These
    textual and spoken documents in both languages will be automatically
    indexed; in case of spoken documents, this will involve automatic
    speech recognition. Given a query in either language, we will then
    investigate
    systems that retrieve relevant documents in both languages for the
    user. Such cross-lingual and cross-media (CLCM) information retrieval
    is a novel problem with many technical challenges. Several schemes
    for recognizing the audio, indexing the text, and for estimating
    translation models to match queries in one language with documents in
    another language will be investigated in the summer. Applications of
    this research include audio and video browsing, spoken document
    retrieval, automated routing of information, and automatically
    alerting the user when special events occur.

    3. Audio-Visual Speech Recognition

    It is well known that humans have the ability to lip-read: we combine
    audio and visual Information in deciding what has been spoken,
    especially in noisy environments. A dramatic example is the so-called
    McGurk effect, where a spoken sound ga is superimposed on the video of
    a person uttering ba. Most people perceive the speaker as uttering
    the sound da.

    We will strive to achieve automatic lip-reading by computers, i.e., to
    make computers recognize human speech even better than is now possible
    from the audio input alone, by using the video of the speaker's face.
    There are many difficult research problems on the way to succeeding in
    this task, e.g., tracking the speakers head as she moves in the
    video-frame, identifying the type of lip-movement, guessing the spoken
    words independently from the video and the audio and combining the
    information from the two signals to make a better guess of what was
    spoken. In the summer, we will focus on a specific problem: how best
    to combine the information from the audio and video signal.

    For example, using visual cues to decide whether a person said /ba/
    rather than /ga/ can be easier than making the decision based on audio
    cues, which can sometimes be confusing. On the other hand, deciding
    between /ka/ and /ga/ is more reliably done from the audio than the
    video. Therefore our confidence in the audio-based and video-based
    hypotheses depends on the kinds of sounds being confused. We will
    invent and test algorithms for combining the automatic speech
    classification decisions based on the audio and visual stimuli,
    resulting in audio-visual speech recognition that significantly
    improves the traditional audio-only speech recognition performance.

    4. Pronunciation Modeling of Mandarin Casual Speech

    When people speak casually in daily life, they are not consistent in
    their pronunciation. In listening to such casual speech, it is quite
    common to find many different pronunciations of individual words.
    Current automatic speech recognition systems can reach a word
    accuracies above 90% when evaluated on carefully produced standard
    speech, but in recognizing casual, unplanned speech, performance drops
    to 75% or even lower. There are many reasons for this. In casual
    speech, one phoneme can shift to another. In mandarin for example,
    the initial / sh / in "wo shi (I am)" is often pronounced weakly and
    shifts into a / r /. In some other cases, sounds are dropped. In
    Mandarin, phonemes such as b, p, d, t, k are often reduced and as a
    result are often recognized as silence. These problems are made
    especially severe in Mandarin casual speech since most Chinese are
    non-native Mandarin speakers. Chinese languages such as Cantonese are
    as different from the standard Mandarin as French is different from
    English. As a result, there is an even larger pronunciation variation
    due to the influence of speakers' native language. We propose to study
    and model such pronunciation differences in casual speech using
    interviews found in Mandarin news broadcasts. We hope to include
    experienced researchers from both China and the US in the areas of
    pronunciation modeling, Mandarin speech recognition, and Chinese
    phonology.

    3 Proposed projects for WS00, Center for Language and Speech
    Processing, Johns Hopkins University, Baltimore, Maryland 21218-2686.

    -- 
            Amy Berdann                      410-516x4778
        Center Administrator                 berdann@jhu.edu
           320 Barton Hall                   http://www.clsp.jhu.edu     
    Center for Language and Speech Processing
        Johns Hopkins University
    



    This archive was generated by hypermail 2b29 : Thu Jan 27 2000 - 22:05:39 MET