Program: Linguistic Databases

John Nerbonne (nerbonne@let.rug.nl)
Wed, 8 Feb 1995 13:41:13 +0100 (MET)

Program: Linguistic Databases

23-24 March 1995

University of Groningen
Centre for Language and Cognition
Centre for Behavioural and Cognitive Neurosciences

A database is simply a declarative representation of information
which is designed to make data entry and retrieval easy, but is not
optimized for other processing. Databases have long been standard
repositories in phonetic research, but they are finding increasing use
not only in phonology, morphology, syntax, historical linguistics and
dialectology but also in areas of applied linguistics such as
lexicography and computer-assisted language learning. Normally, they
serve as a repositories for large amounts of data, but they are also
important for the organization they impose, which serves to ease
access for researchers and applications specialists.
The purpose of a conference specifically on this topic is to provide
a forum for the exchange of information and views on the proper use of
databases within the various subfields of linguistics. Our call for
papers expressed the hope that we would receive abstracts on the
following topics:

1. Databases vs. annotated corpora, pros and cons.
2. Needs wrt acoustic data, string data, temporal data. Existing facilities.
3. Developing (maximally) theory-neutral db schemas for annotation systems.
4. Commercially available systems vs. public domain systems.
What's available?
5. Uses in grammar checking, replication of results.
6. Needs of applications such as lexicography.
7. Making use of CD-ROM technology.
8. Existing professional expertise: Linguistic Data Consortium (LDC), TEI.

Invited Speakers

Jan Aarts "Annotation of Corpora: General Issues and the Nijmegen Experience"
Prof. of English, Nijmegen, leader of TOSCA, Linguistic Database projects

Sylviane Granger "The Computer Learner Corpus: a Testbed for
Electronic EFL Tools"
Prof. of English, Louvain

Mark Liberman "Electronic Publication of Linguistic Data"
Prof. of Linguistics & Computer Science, Pennsylvania;
Director, Ling. Data Consortium

Gary Simons "Multilingual Data Processing in the CELLAR Environment"
Director, Academic Computing, Summer Institute of Linguistics, Dallas

Program Committee Tjeerd de Graaf (Phonetics), Tette Hofstra (Historical
Ling.), John Nerbonne (Computational Ling., Program Chair), and Herman
Wekker (Descriptive Ling.).

Local Arangements Duco Dokter d.a.dokter@let.rug.nl

Thurs., 23 March

10:00 Registration and Coffee
10:45 Opening

Annotation of Corpora

11:00 Jan Aarts, Nijmegen (Invited Speaker)
"Annotation of Corpora: General Issues and the Nijmegen Experience"

Contributed Talks: Corpora and Test-Suite Construction

12:00 Susan Armstrong (ISSCO, Geneva) and Henry Thompson (Edinburgh)
"A Presentation of MLCC: Multilingual Corpora for Cooperation"

12:30 Lunch

2:00 Stephan Oepen and Klaus Netter (DFKI, Saarbrucken)
"TSNLP Test Suites for Natural Language Processing"
2:30 Martin Volk (Zurich), Arne Fitschen and Stefan Pieper
(Koblenz-Landau)
"Markup of a Test Suite with SGML"
3:00 Jacques Le Maitre and Monique Rolbert (Marseille)
"From Annotated Corpora to Databases: the SgmlQL Language"
3:30 Tea

Contributed Talks: Pure and Applied Linguistics

12:00 Eric Fudge and Linda Shockey (Reading)
"The Reading Syllable Database"

12:30 Lunch

2:00 Dietmar Zaefferer (Munich)
"Options for a Cross-Linguistic Reference Grammar Database"
2:30 Siobhan Devlin, John Tait and Chris Bloor (Sunderland)
"The Use of a Psycholinguistic Database in the Simplification of
Text for the Aphasic Reader"
3:00 Masahito Watanabe (Meikai, Yokohama)
"A Better Language Database for Language Teaching"
3:30 Tea

Second Language Learning

4:00 Sylviane Granger, Louvain (Invited Speaker)
"The Computer Learner Corpus: a Testbed for Electronic EFL Tools"

5:15 Demonstrations
TSNLP (Oepen and Netter)
LeX4 (Gebhardi)
ETCverif (Chollet) (tentative)
ALD (Haimerl)

8:00 Dinner

Fri., 24 March

Multilingual Databases

9:00 Gary Simons, Summer Institute of Linguistics, Dallas (Invited Speaker)
"Multilingual Data Processing in the CELLAR Environment"

Contributed Talks: Lexical Databases

10:00 Andrew Bredenkamp, Louisa Sadler, Andrew Spencer, and Marina
Zaretskaya (Essex)
"Investigating Argument Structure: The Nominalisation Database"
10:30 Coffee
11:00 Gunter Gebhardi (Berlin)
"Aspects of Lexicon Maintenance in Computational Linguistics"
11:30 Elisabeth Godbert (Marseille)
"Elaboration of a Lexical Database with the help of a Semantic Network"
12:00 Kerstin Fischer and Michaela Johanntokrax (Bielefeld)
"A Lexical Database for the Automatic Recognition of Discourse
Particles"

12:30 Lunch

2:00 Walid Saba (Bell Labs, Middletown)
"An Extensible Class Library for an Object-Oriented Lexicon"
2:30 Oliver Christ (Stuttgart)
"Linking WordNet to a Corpus Query System"

Contributed Talks: Phonetic Databases

10:00 Peter Roach, Jane Setter, Simon Arnfield (Reading), Mitch Waterman,
Carol Sherrard and Peter Greasley (Leeds)
"Adding Paralinguistic and Psychological Information to a Spoken
Language Database"
10:30 Coffee
11:00 Kamel Bensaber, Jean Serignat and Pascal Perrier (ICP, Grenoble)
"BD_ART: Multimedia Articulatory Database"
11:30 Werner Deutsch, Ralf Vollmann, Anton Noll and Sylvia Moosmuller
(Academy of Sciences, Vienna)
"An Open Systems Approach for Acoustic-Phonetic Continuous Speech
Databases"
12:00 Lou Boves and Els den Os (SPEX, Leidschendam)
"Linguistic Research using Large Speech Corpora"

12:30 Lunch

2:00 Edgar Haimerl, Salzburg
"A Database Application for the Generation of Phonetic Atlas Maps"
2:30 Girard Chollet, Jean-Luc Cochard, Cidric Jaboulet, Robert van
Kommer and Philippe Langlais (IDIAP, Martigny)
"Swiss French Polyphone: a Telephone Speech Database to Develop
Interactive Voice Servers"

Professional Support

3:00 Mark Liberman, Pennsylvania (Invited Speaker)
"Electronic Publication of Linguistic Data"

4:00 Closing & "Borrel"