Corpora: New Corpora from the Linguistic Data Consortium

LDC Office (ldc@unagi.cis.upenn.edu)
Mon, 20 Apr 1998 16:42:01 EDT

Announcing a NEW RELEASE from the
LINGUISTIC DATA CONSORTIUM


COMLEX English Syntax Lexicon, Version 3.0

This is a moderately broad coverage English lexicon (with about
38,000 lemmas) developed at New York University under LDC
sponsorship. It contains detailed information about the
syntactic characteristics of each lexical item, and is
particularly detailed in its treatment of subcategorization
(complement structures).

In the current dictionary, nouns have 9 possible features and 9
possible complements; adjectives have 7 features and 14
complements; verbs have 5 features and 92 complements; and
adverbs have 11 positional classes and 12 features. The entries
for 750 frequent verbs contain 100 tags each, where a tag
includes: a pointer to an instance of that verb in a corpus and
the subcategorization appropriate for that instance.

This latest version of COMLEX Syntax has been updated to
include the adverb classes. We also added diacritics to foreign
words, while retaining the unaccented versions and performed
various other updates to correct and supplement our lexical
entries. For more details about this revised version, please
contact Adam Meyers at New York University (meyers@cs.nyu.edu).

This release is accompanied by the COMLEX Syntax Text Corpus,
Version 2.0. The Text corpus consists of material from the
following sources:

The Brown Corpus, Francis, W. Nelson, 1964 Brown University,
Providence

Wall Street Journal Material, Copyright 1989 Dow
Jones, Inc.

San Jose Mercury News, Copyright 1991 San Jose Mercury News

Associated Press, Copyright 1988

Federal Register materials courtesy of IBM; formatted version
copyright 1992, University of Pennsylvania

Computer Library materials copyright owned by Ziff
Communications Company and other parties as their respective
interests may appear.

Institutions that have membership in the LDC during the 1998
Membership Year will be able to receive COMLEX Syntax Lexicon
3.0 at no additional charge, in the same manner as all other
text and speech corpora published by the LDC. Members who wish
to receive this corpus must sign the COMLEX user agreement.
This agreement is available on the Linguistic Data Consortium
WWW Home Page at URL
http://www.ldc.upenn.edu/ldc/catalog/index.html.

Nonmembers can receive a copy of COMLEX Syntax Lexicon 3.0 for
research purposes only for a fee of $1500. If you would like to
order a copy of this corpus, please email your request to
ldc@unagi.cis.upenn.edu. If you need additional information
before placing your order, or would like to inquire about
membership in the LDC, please send email or call (215) 898-0464.

Further information about the LDC and its available corpora can
be accessed on the Linguistic Data Consortium WWW Home Page at
URL http://www.ldc.upenn.edu/. Information is also available
via ftp at ftp.cis.upenn.edu under pub/ldc; for ftp access,
please use "anonymous" as your login name, and give your email
address when asked for password.