Corpora: CFP CL Special Issue: Web As Corpus

From: Adam Kilgarriff (adam.kilgarriff@itri.brighton.ac.uk)
Date: Fri Sep 28 2001 - 16:56:23 MET DST

  • Next message: Pascual Cantos: "Corpora: Spanish Corpus"

                               CALL FOR PAPERS
                  SPECIAL ISSUE of COMPUTATIONAL LINGUISTICS
                                Web as Corpus

    Guest editors
    Adam Kilgarriff, ITRI, University of Brighton and
                          Oxford University Press
    Gregory Grefenstette, Clairvoyance Corporation

    The Web is an immense, multilingual, freely available corpus. As with
    other large new corpora, computational linguists have been stimulated
    by its presence. Web research includes many of the most talked about
    papers of recent ACL and other meetings (eg Resnik, ACL '99; Brill,
    "Does the web change everything?", ACL SIGNLL '01).

    In comparison with most corpora studied to date, the web is
    heterogeneous and noisy. Methods for handling the noise, and
    extracting and exploiting subcorpora meeting particular criteria, are
    being developed by a widening population ranging from students who
    realise that it is an obvious place to obtain their corpus for free,
    to companies who seek to use HLT techniques on datasets other than the
    ones HLT researchers usually use.

    NLP can both give to, and take from, the web (distinction due to
    Dragomir Radev). It can give to the web technologies such as
    summarisation, MT and question-answering. But the giving side of the
    equation looks only at short-to-medium term goals. For the longer
    term, for 'giving' as well as for other purposes, a deeper
    understanding of the linguistic nature of the web and its potential
    for CL/NLP is required. For that, we must take the web itself, in
    whatever limited way, as an object of study, and uncover what it has
    to tell us about the nature of language. The Special Issue will focus
    on how we can use the web, rather than how we can help web users.

    The issues which we will expect Special Issue papers to cover include:

          Lexical data derived from the Web
          Classifying Web language; the range of text types on the Web
          Mapping Web documents onto existing ontologies;
                              implications for ontologies
          Clustering in an open corpus
          The multilingual Web as a resource for translation
          CL/HLT engagement with the Semantic Web

    SCHEDULE

    Papers due: 30 April 2002

    SUBMISSION PROCEDURE

    Initial submissions should be sent to:
    1. Guest Editors
          adam.kilgarriff@itri.brighton.ac.uk, grefen@clairvoyancecorp.com
    2. Publishing Editor
          Julia Hirschberg (julia@research.att.com)

    For initial submissions only, authors should send electronic copies
    (postscript, pdf, rtf, or doc) to both the Guest Editors and the
    Publishing Editor. Please indicate that the submission is for the
    Special Issue of Computational Linguistics: Web as Corpus.

    Questions about submissions should be directed to the two Guest
    Editors, rather than the Journal or Publishing Editors.



    This archive was generated by hypermail 2b29 : Fri Sep 28 2001 - 16:50:45 MET DST