[Corpora-List] ACL 04 Workshop: Tackling the challenges of terascale human language problems

From: Miles Osborne (miles@inf.ed.ac.uk)
Date: Sun Jan 11 2004 - 17:07:34 MET

  • Next message: Rada Mihalcea: "[Corpora-List] ACL 2004 Workshop: Senseval-3 Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text"

    ***** PLEASE DISTRIBUTE *****

    Tackling the challenges of terascale human language problems

    http://www-rohan.sdsu.edu/~malouf/terascale04.html

    Workshop at ACL-2004
    Barcelona, Spain, July 26, 2004

    Description:

    Machine learning methods form the core of most modern speech and
    language processing technologies. Techniques such as kernel methods,
    log-linear models, and graphical models are routinely used to classify
    examples (e.g., to identify the topic of a story), rank candidates (to
    order a set of parses for some sentence) or assign labels to sequences
    (to identify named entities in a sentence). While considerable success
    has been achieved using these algorithms, what has become increasingly
    clear is that the size and complexity of the problems---in terms of
    number of training examples, the size of the feature space, and the
    size of the prediction space---are growing at a much faster rate than
    our computational resources are, Moore's Law notwithstanding. This
    raises real questions as to whether our current crop of algorithms
    will scale gracefully when processing such problems. For example,
    training Support Vector Machines for relatively small-scale problems,
    such as classifying phones in the speech TIMIT dataset, will take an
    estimated six years of CPU time (Salomon, et al. 2002). If we wished
    to move to a larger domain and harness, say, all the speech data
    emerging from a typical call center, then very clearly enormous
    computational resources would be needed to be devoted to the task.

    Allocation of such vast amounts of computational resources is beyond
    the scope of most current research collaborations, which consist of
    small groups of people working on isolated tasks using small networks
    of commodity machines. The ability to deal with large-scale speech and
    language problems requires a move away from isolated individual groups
    of researchers towards co-ordinated `virtual organizations'.

    The terascale problems that are now emerging demand an understanding
    of how to manage people and resources possibly distributed over many
    sites. Evidence of the timely nature of this workshop can be seen at
    this year's "Text Retrieval Conference" (TREC), which concluded with
    the announcement of a new track next year which would be specifically
    devoted to scaling information retrieval systems. This clearly
    demonstrates the community need for scaling human language
    technologies.

    In order to address large scale speech and language problems that
    arise in realistic tasks, we must address the issue of scalable
    machine learning algorithms that can better exploit the structure of
    such problems, their computational resource requirements and its
    implications on how we carry out research as a community.

    This workshop will bring researchers together who are interested in
    meeting the challenges associated with scaling systems for natural
    language processing. Topics include (but are not limited to):

      + exactly scaling existing techniques

      + applying interesting approximations which drastically reduce the
        amount of required computation yet do not sacrifice much in the way
        of accuracy

      + using on-line learning algorithms to learn from streaming data sources

      + efficiently retraining models as more data becomes available

      + experience with using very large datasets, apply for example Grid
        computing strategies technologies

      + techniques for efficiently manipulating enormous volumes of data

      + human factors associated with managing large virtual organizations

      + adapting methods developed for dealing with large-scale problems
        in other computational sciences, such as physics and biology, to natural
        language processing

    Invited Speaker:

    TBA

    Provisional program committee:

    Chris Manning, Stanford University
    Dan Roth, Univ of Urbana-Champaign
    Ewan Klein, Univ of Edinburgh
    Jun-ichi Tsujii, Tokyo University
    Patrick Haffner, AT&T Research
    Roger Evans, ITRI, Brighton
    Steven Bird, Univ of Melbourne
    Stephen Clark, Univ of Edinburgh
    Thorsten Brants, Google
    Walter Daelemans, Univ of Antwerp
    Yann LeCun, New York University
    John Carroll, Univ. of Sussex

    Organizing Committee:

    Miles Osborne, Univ of Edinburgh
    Robert Malouf, San Diego State University
    Srinivas Bangalore, AT&T Labs-Research

    SUBMISSION FORMAT

    Submissions must use the ACL latex style (available from the ACL 04 web
    page). Paper submissions should consist of a full paper. The page
    limit is eight pages. Reviewing will NOT be blind.

    SUBMISSION PROCEDURE

    Electronic submission only: send a gzipped postscript (preferred) or gzipped
    PDF file with your submission to:

    miles@inf.junk.ed.ac.uk *remove junk from address*

    Please name the file with the surname of the first author (for example, "osborne.ps.gz") and in the subject line put "ACL 04 Workshop".

    DEADLINES

    Paper submission deadline: April 18th, 2004
    Notification of acceptance for papers: April 30th, 2004
    Camera ready papers due: May 15th, 2004
    Wokshop date: Jul 26th, 2004

    Contact Information:

    Miles Osborne (miles@inf.junk.ed.ac.uk) *remove junk from address*



    This archive was generated by hypermail 2b29 : Sun Jan 11 2004 - 17:19:17 MET