[Corpora-List] Workshop on Shallow Processing of Large Corpora - Second Call for Paper

From: Kiril Simov (kivs@bultreebank.org)
Date: Thu Jan 02 2003 - 15:37:01 MET

  • Next message: P. Kaszubski: "[Corpora-List] Lexical syllabus"

                             Second Call for Papers

          Workshop on Shallow Processing of Large Corpora
             http://www.bultreebank.org/SProLaC.html
                          (SProLaC 2003)
                     CORPUS LINGUISTICS 2003
            Lancaster University (UK), 27 March, 2003

    The workshop will take place on the 27th of March 2003 at the
    CORPUS LINGUISTICS 2003 Conference at Lancaster University (UK).
    http://www.comp.lancs.ac.uk/ucrel/cl2003/

    Workshop motivation and aims:

    Corpora have developed with respect to two main directions:

        - large corpora of size min. 100 mln. tokens, and

        - small corpora of size up to 1 mln. tokens.

    The data in the former is only morpho-syntactically annotated
    and the data in the latter is assigned more detailed syntactic
    (and) semantic information. Needless to say, both types of
    language corpora are valuable. However, a question arises,
    whether it is possible to build a really large corpus, which is
    fully processed linguistically. Since it is a hard task and
    concerns metadata problems (theories, availability of appropriate
    tools etc), we put the stress on shallow parsing of unrestricted
    data. In our view, the creation of such a resource, using
    automation, is a task of great importance. It would serve as a
    template for linguistic research, consistency checking and
    validation, large-scale applications in Information Retrieval
    and Information Extraction, testing of machine learning
    algorithms and many others. This task is related to other
    subtasks, such as: an adequate combination of diverse
    shallow processing techniques in a sound and robust
    processor, and smoothing shallow parsing approaches
    for stages of deeper linguistic analyses.

    The workshop aims at being a forum for researchers to
    present their work in the area of Computational Corpus
    Linguistics and Language Engineering and to discuss
    the problems in design, management, linguistic interpretation
    and exploration of unrestricted data from both perspectives.

    We envisage a one-day workshop and 10-12 presentations.

    Topics of interest:

        - design principles for shallow-parsed large corpora;
        - text segmentation and preprocessing;
        - definition of the connection between the levels
          of processing;
        - chunk and partial parsing of large amounts of texts;
        - machine learning methods with large coverage;
        - software systems for management and accessibility
          to shallow-parsed large corpora;
        - applications of shallow-parsed large corpora

    There will be a general discussion at the end of the workshop.

    Important dates:

    Deadline for workshop abstract submission: 10th January 2003
    Notification of acceptance: 3rd February 2003
    Final version of paper for workshop proceedings: 3rd March 2003

    Submissions:

    Papers should describe existing research connected to
    the topics of the workshop. The presentation at the
    workshop will be 25 minutes long (20 minutes for
    presentation and 5 minutes for questions and discussion).
    Each submission should show: title; author(s); affiliation(s);
    and contact author's e-mail address, postal address,
    telephone and fax numbers. Abstracts (maximum 500 words,
    plain-text format) should be sent to:

    Kiril Simov
    Email: kivs@bultreebank.org

    The final version of the accepted papers should follow
    the format for the main conference and should be no more
    than 10 pages long. Instructions for formatting can be
    found on the main conference page.

    There will be a proceedings of the workshop.

    Registration:

    The registration will be managed by the local organisers
    of the main conference.

    Programme committee:

    Michael Barlow, USA
    Tomaz Erjavec, Slovenia
    Silvia Hansen, Germany
    Atanas Kiryakov, Bulgaria
    Sandra Kuebler, Germany
    Ghassan Mourad, France
    Joakim Nivre, Sweden
    Kemal Oflazer, Turkey
    Karel Oliva, Austria
    Petya Osenova, Bulgaria (co-chair)
    Vladimir Petkevic, Czech Republic
    Adam Przepi'orkowski, Poland
    Geoffrey Sampson, UK
    Kiril Simov, Bulgaria (co-chair)
    Milena Slavcheva, Bulgaria
    Marko Tadic, Croatia
    Dan Tufis, Romania
    Tylman Ule, Germany
    Tamas Varadi, Hungary
    Nikolaj Vazov, Bulgaria
    Andreas Wagner, Germany

    Organizing committee:

    Kiril Simov
    BulTreeBank Project
    Linguistic Modelling Laboratory, CLPP,
    Bulgarian Academy of Sciences
    Acad. G.Bonchev St. 25A
    1113 Sofia, Bulgaria
    Tel: (+359 2) 979 2825
    Fax: (+359 2) 70 72 73
    Email: kivs@bultreebank.org
    http://www.BulTreeBank.org

    Petya Osenova
    BulTreeBank Project
    Linguistic Modelling Laboratory, CLPP,
    Bulgarian Academy of Sciences
    Acad. G.Bonchev St. 25A
    1113 Sofia, Bulgaria
    Tel: (+359 2) 979 2825
    Fax: (+359 2) 70 72 73
    Email: petya@bultreebank.org
    http://www.BulTreeBank.org



    This archive was generated by hypermail 2b29 : Thu Jan 02 2003 - 15:31:30 MET