Corpora: FINAL CALL FOR PAPERS: ACL Workshop on Unsupervised Learning for NLP

Andy Kehler (kehler@ai.sri.com)
Wed, 24 Mar 1999 12:17:11 -0800

**************************************************************
FINAL CALL FOR PAPERS
DEADLINE MARCH 26, 1999
**************************************************************

ACL-99 Workshop

Unsupervised Learning in Natural Language Processing

University of Maryland, College Park, MD, USA
June 21st, 1999

http://www.ai.sri.com/~kehler/unsup-acl-99.html

Endorsed by the Association for Computational Linguistics (ACL)
Special Interest Group on Natural Language Learning (SIGNLL)

WORKSHOP DESCRIPTION

Many of the successes achieved from using learning techniques in
natural language processing (NLP) have utilized the supervised
paradigm, in which models are trained from data annotated with the
target concepts to be learned. For instance, the target concepts in
language modeling for speech recognition are words, and thus raw text
corpora suffice. The first successful part-of-speech taggers were
made possible by the existence of the Brown corpus (Francis, 1964), a
million-word data set which was laboriously hand-tagged a quarter of a
century prior. Finally, progress in statistical parsing required the
development of the Penn Treebank data set (Marcus et al. 1993), the
result of many staff years of effort. While it is worthwhile to
utilize annotated data when it is available, the future success of
learning for natural language systems cannot depend on a paradigm
requiring that large, annotated data sets be created for each new
problem or application. The costs of annotation are prohibitively
time and expertise intensive, and the resulting corpora are too
susceptible to restriction to a particular domain, application, or
genre.

Thus, long-term progress in NLP is likely to be dependent on the use
of unsupervised and weakly supervised learning techniques, which do
not require large annotated data sets. Unsupervised learning utilizes
raw, unannotated data to discover underlying structure giving rise to
emergent patterns and principles. Weakly supervised learning uses
supervised learning on small, annotated data sets to seed unsupervised
learning using much larger, unannotated data sets. Because these
techniques are capable of identifying new and unanticipated
correlations in data, they have the additional advantage of being able
to feed new insights back into more traditional lines of basic
research.

Unsupervised and weakly supervised methods have been used successfully
in several areas of NLP, including acquiring verb subcategorization
frames (Brent, 1993; Manning, 1993), part-of-speech tagging (Brill,
1997), word sense disambiguation (Yarowsky, 1995), and prepositional
phrase attachment (Ratnaparkhi, 1998). The goal of this workshop is
to discuss, promote, and present new research results (positive and
negative) in the use of such methods in NLP. We encourage submissions
on work applying learning to any area of language interpretation or
production in which the training data does not come fully annotated
with the target concepts to be learned, including:

* Fully unsupervised algorithms
* `Weakly supervised' learning, bootstrapping models from small sets
of annotated data
* `Indirectly supervised' learning, in which end-to-end task
evaluation drives learning in an embedded language interpretation
module
* Exploratory data analysis techniques applied to linguistic data
* Unsupervised adaptation of existing models in changing environments
* Quantitative and qualitative comparisons of results obtained with
supervised and unsupervised learning approaches

Position papers on the pros and cons of supervised vs. unsupervised
learning will also be considered.

FORMAT FOR SUBMISSION

Paper submissions can take the form of extended abstracts or full
papers, not to exceed six (6) pages. Authors of extended abstracts
should note the short timespan between notification of acceptance and
the final paper deadline. Up to two more pages may be allocated for
the final paper depending on space constraints.

Authors are requested to submit one electronic version of their papers
*or* four hardcopies. Please submit hardcopies only if electronic
submission is impossible. Submissions in Postscript or PDF format are
strongly preferred.

If possible, please conform with the traditional two-column ACL
Proceedings format. Style files can be downloaded from
ftp://ftp.cs.columbia.edu/acl-l/Styfiles/Proceedings/.

Email submissions should be sent to: kehler@ai.sri.com

Hard copy submissions should be sent to:

Andrew Kehler
SRI International
333 Ravenswood Avenue
EK272
Menlo Park, CA 94025

TIMETABLE

Paper submission deadline: March 26
Notification of acceptance: April 16
Camera ready papers due: April 30

ORGANIZERS

Andrew Kehler (SRI International)
Andreas Stolcke (SRI International)

PROGRAM COMMITTEE

Michael Brent (Johns Hopkins University)
Eric Brill (Johns Hopkins University)
Rebecca Bruce (University of North Carolina at Asheville)
Eugene Charniak (Brown University)
Michael Collins (AT&T Laboratories)
Marie desJardins (SRI International)
Moises Goldszmidt (SRI International)
Andrew Kehler (SRI International)
John Lafferty (Carnegie-Mellon University)
Lillian Lee (Cornell University)
Chris Manning (University of Sydney)
Andrew McCallum (Carnegie-Mellon University and Just Research)
Ray Mooney (University of Texas, Austin)
Srini Narayanan (ICSI, Berkeley)
Fernando Pereira (AT&T Laboratories)
David Powers (Flinders University of South Australia)
Adwait Ratnaparkhi (IBM Research)
Dan Roth (University of Illinois at Urbana-Champaign)
Andreas Stolcke (SRI International)
Janyce Wiebe (New Mexico State University)
Dekai Wu (Hong Kong University of Science and Technology)
David Yarowsky (Johns Hopkins University)