[Corpora-List] CFP: ICML Workshop - Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining

From: Rayid Ghani (rayid.ghani@accenture.com)
Date: Tue Mar 04 2003 - 01:59:42 MET

  • Next message: Priscilla Rasmussen: "[Corpora-List] HLT/NAACL-2003 Workahop Revised CFP & Deadline Extension: Learning Word Meaning from Non-Linguistic Data"

    CALL FOR PAPERS
    ICML 2003 Workshop (Co-located with KDD 2003)
    The Continuum from Labeled to Unlabeled Data in Machine Learning and
    Data Mining
    August 21, 2003. Washington, DC.
    http://www.accenture.com/techlabs/icmlworkshop2003/

    Important Dates
    Papers Due: May 1, 2003
    Notification: May 25, 2003
    Final Version Due: June 10, 2003
    Workshop: August 21, 2003

    There is a spectrum of ways to use data in machine learning and data
    mining. At the one end is completely unsupervised learning or
    clustering, and at the other end is supervised learning where the target
    output is known for every instance.

    This workshop aims to explore the space between these extremes, with
    particular attention to a variety of real-world applications, and
    sources of labels. Techniques that have been proposed include learning
    from unlabeled data with hints, learning from unlabeled and
    positive-only labeled data, learning from distantly and noisily labeled
    data, combining labeled and unlabeled data with cotraining, EM and other
    semi-supervised techniques, and transductive learning, where the test
    data is added as an additional source of unlabeled data. The possible
    sources of labels and hints are also broad. Systematic hand-labeling,
    labels acquired through active learning, and hints derived from domain
    knowledge are among the techniques that may be used.

    Papers addressing novel types of data, methods of diagnosing when
    unlabeled data will help and when it will hinder, and applying
    techniques across multiple application domains and multiple levels of
    supervision are particularly encouraged. Papers discussing the
    acquisition of labels from real-world experts in real-world data mining
    problems are also encouraged. Data mining practitioners working on
    real-world problems with large amounts of captured/stored data but a
    high cost labeling process are encouraged to submit problem descriptions
    and possible solutions.

    Workshop Format
    The workshop will consist of both regular paper presentations, and
    debates.

    Regular Papers
    Regular papers can be up to eight pages, and may address work in
    progress. Papers should be in the format required for ICML submissions.
    The formatting instructions can be found at
    <http://www.hpl.hp.com/conferences/icml03/formats/index.html>
    http://www.hpl.hp.com/conferences/icml03/formats/index.html.

    Problem Descriptions from Machine Learning/Data Mining Practitioners
    Papers of one to two pages describing a problem domain you have
    encountered or dealt with where training data and/or labels are very
    expensive or hard to obtain. The paper would present a problem
    statement, give background on the domain, and list sources and amount of
    available training data. We hope these types of papers will encourage
    participation from people working on practical applications where
    unlabeled data can potentially be valuable but is not currently
    utilized. We hope to devote a session in the workshop to discuss these
    problems and brainstorm possible solutions and ways to use unlabeled
    data for the problems posed in these papers.

    Debate Position Papers
    Two-page position papers on either side of the following topics are
    solicited. Accepted papers will be published in the workshop
    proceedings, and authors will be expected to debate their position.
    Topics not on this list are also acceptable, if you can coherently argue
    both sides, or can encourage a colleague to submit the opposing
    position.

       - Unlabeled data is only useful when there are a large number of
    redundant features.
       - Why doesn't The No Free Lunch Theorem apply when working with
    unlabeled data?
       - Unlabeled data has to come from the same underlying distribution as
    the labeled data.
       - Can unlabeled data be used in temporal domains?
       - Feature engineering is more important than algorithm design for
    semi-supervised learning.
       - All the interesting problems in semi-supervised learning have been
    identified.
       - Active learning is an interesting "academic" problem.
       - Active learning research without user interface design is only
    solving half the problem.
       - Using Unlabeled data in Data Mining is no different than using it
    in Machine Learning.
       - Massive data sets pose problems when using current semi-supervised
    algorithms.
       - Off-the-shelf data mining software incorporating labeled and
    unlabeled data is a fantasy.
       - Unlabeled data is only useful when the classes are well separated.

    Submissions should be sent by May 1, 2003 as PDF or PostScript files to
    Rayid.Ghani@accenture.com.

    Organizers
    Rayid Ghani
    Accenture Technology Labs, 161 N. Clark St, Chicago, IL 60601
    rayid.ghani@accenture.com
    +1 (312) 693-6653

    Rosie Jones
    Overture Services, 74 N. Pasadena Ave 3F, Pasadena, CA 91107
    rosie.jones@overture.com
    +1 (626)229-8536

    Chuck Rosenberg
    Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213
    chuck@cs.cmu.edu
    +1 (412) 268-8078

    Program Commitee
    Kristin Bennett, Rennselear Polytechnic Institute
    Mark Craven, University of Wisconsin
    Zoubin Ghahramani, Gatsby Computational Neuroscience Unit, UCL
    Sally Goldman, Washington University, St. Louis
    Tony Jebara, Columbia University
    Thorsten Joachims, Cornell University
    Stefan Kremer, University of Guelph
    Bing Liu, National University of Singapore
    Andrew McCallum, University of Massachusetts
    Ray Mooney, University of Texas, Austin
    Ion Muslea, University of California, Irvine
    Kamal Nigam, IntelliSeek
    Ellen Riloff, University of Utah
    Dale Schuurmans, University of Waterloo
    Martin Szummer, Microsoft Research, Cambridge
    Sarah Zelikovitz, City University of New York
    Tong Zhang, IBM Research, Yorktown Heights
     
    Rayid Ghani
    Accenture Technology Labs
    312-693-6653
    www.accenture.com/techlabs/ghani



    This archive was generated by hypermail 2b29 : Sat Mar 08 2003 - 19:49:27 MET