Corpora: ELRA News

Valerie Mapelli (mapelli@elda.fr)
Fri, 06 Aug 1999 15:47:32 +0200

[ We apologise for the duplicate posting of this announcement ]

___________________________________________________________
ELRA
European Language Resources Association
ELRA News
___________________________________________________________

*** ELRA NEW RESOURCES : EUROWORDNET ***

We are happy to announce the availability of some EUROWORDNET resources via
ELRA:

A. Available Wornets

Dutch wordnet - 44015 synsets
English wordnet (additional relations which are missing in WordNet1.5) -
16361 synsets
Spanish wordnet - 30485 synsets

B. LR(1) Common Components

1. The Inter-Lingual-Index, which is a list of records (ILI-records), in
the form of synsets mainly taken from WordNet1.5 or manually created. An
ILI-record contains:

- synset: set of synonymous words or phrases (mostly from WordNet1.5)
-
part-of-speech,
- one or more Top-Concept classifications (Optional)
- one
or more Domain labels (Optional)
- a gloss in English (mostly from
WordNet1.5)
- a unique ID linking the synset to its source (mostly WordNet1.5)

2. Top-Ontology: an ontology of 63 basic semantic classes based on
fundamental distinctions. By means of the Top-Ontology all the wordnets can
be accessed using a single language-independent classification-scheme.
Top-Concepts are only assigned to ILI-records.

3. Domain-ontology: an ontology of subject-domains optionally assigned to
ILI-records.

4. A selection of ILI-records, the so-called Base-Concepts, which play a
major role in the different wordnets. These Base-Concepts form the core of
all the wordnets. All the Base-Concepts are classified in terms of the
Top-Concepts that apply to them.

5. WordNet1.5 (91591 synsets; 168217 meanings; 126520 entry words) in
EuroWordNet format.

C. LR(2) Language-Specific Components

Wordnets produced in the first project (LE2-4003):

- Dutch wordnet
- English wordnet (additional relations which are missing in WordNet1.5)
- Italian wordnet
- Spanish wordnet

After extension of the project (LE4-8328):

- German wordnet
- French wordnet
- Czech wordnet
- Estonian wordnet

The specific wordnets are language-internal structures, minimally containing:

· set of variants or synonyms making up the synset
· part-of-speech
· language-internal relations to other synsets
· equivalence relations with ILI-records
· a unique-id linking the synset to its source

Each wordnet will be distributed with LR1 and will include documentation on
LR1 and the distributed wordnet. All the data will be distributed as
text-files in the EuroWordNet import format and as Polaris database files
(see below LR3). The EuroWordNet viewer (Periscope, see below LR3) can be
used to access the database version. Polaris has to be licensed to modify
and extend the database version.

The wordnets are distributed without:

· glosses
· usage labels
· morpho-syntactic properties
· examples
· word-to-word translations

D. LR(3) Software

The multilingual EUROWORDNET Database (partly Foreground, partly
Background) consists of three components:

• The actual wordnets in Flaim database format: an indexing and compression
format of Novell.
• Polaris (Louw 1997): a wordnet editing tool for creating, editing and
exporting wordnets.
• Periscope (Cuypers and Adriaens 1997): a graphical database viewer for
viewing and exporting wordnets.

The Polaris tool is a re-implementation of the Novell ConceptNet toolkit
(Díez-Orzas et al 1995) adapted to the EuroWordNet architecture. Polaris
can import new wordnets or wordnet fragments from ASCII files with the
correct import format and it creates an indexed EUROWORDNET Database.
Furthermore, it allows a user to edit and add relations in the wordnets and
to formulate queries. The Polaris toolkit makes it possible to visualise
the semantic relations as a tree-structure that can directly be edited.
These trees can be expanded and shrunk by clicking on word-meanings and by
specifying so-called TABs indicating the kind and depth of relations that
need to be shown. Expanded trees or sub-trees can be stored as a set of
synsets, which can be manipulated, saved or loaded. Additionally, it is
possible to access the ILI or the ontologies, and to switch between the
wordnets and ontologies via the ILI. Finally, it contains an interface to
project sets of synsets across wordnets.

The Periscope program is a public viewer that can be used to look at
wordnets created by the Polaris tool and to compare them in a graphical
interface. Word meanings can be looked up and trees can be expanded.
Individual meanings or complete branches can be projected on another
wordnet or wordnet structures can be compared via the equivalence relations
with the Inter-Lingual-Index. Selected trees can be exported to text files.
The Periscope program cannot be used for importing or changing wordnets.

E. Prices

The prices are based on the number of synsets in each wordnet and differ
for the kind of usage and ELRA-membership. For more information, please
contact ELRA or visit our Web site.

F. Technical support

Technical support may be provided by members of the consortium. It will be
implemented through bilateral agreements between the User and the member of
the consortium responsible for the data acquired by User. As an indication
the support contract will be on a yearly basis and will cost 10-20 KEURO/Year.

=====================================
For further information, please contact :

ELRA/ELDA Tel : +33 01 43 13 33 33
55-57 rue Brillat-Savarin Fax : +33 01 43 13 33 30
F-75013 Paris, France E-mail : mapelli@elda.fr

or visit our Web site:

http://www.icp.grenet.fr/ELRA/home.html
=====================================