[Corpora-List] 2nd call : DISCOURSE ORGANISATION IN THE AGE OF THE DIGITAL DOCUMENT - ATALA Workshop 22 June 04

From: Marie-Paule PERY-WOODLEY (pery@univ-tlse2.fr)
Date: Mon Jan 19 2004 - 15:05:00 MET

  • Next message: Marie-Hélène Antoni: "[Corpora-List] CIFT'04 : second call for paper : 08 Feb 04"

    2nd CALL FOR PAPERS

    ==================================================================================
    MODELLING AND DESCRIBING DISCOURSE ORGANISATION IN THE AGE OF THE DIGITAL
    DOCUMENT
    ==================================================================================

    Workshop proposed by ATALA (Association pour le traitement automatique des
    langues)-
    as part of the SEMAINE DU DOCUMENT NUMERIQUE (Digital Document
    Week)(<www.univ-lr.fr/sdn2004>)
    La Rochelle, France, 22 juin 2004

    organised by Marie-Paule Péry-Woodley,
    Equipe de recherche en Syntaxe et Sémantique/Université de Toulouse-Le
    Mirail (pery@univ-tlse2.fr)

    The Digital Document Week aims to gather research communities dealing with
    digital documents from a variety of angles: media, technical and social
    modes of mediation, relation with human activity. Within this framework,the
    ATALA workshop wishes to broach these questions from a linguistic point of
    view, focussing on digital documents as discourse, characterised by an
    internal organisation which needs to be understood and may be exploited in
    computer-based systems. The workshop aims to bring together three research
    areas concerned with the development of digital documents: the study of
    discourse organisation, corpus linguistics, computer-based applications for
    the exploitation of digital documents.

    For text and discourse linguistics, the proliferation of digital documents
    leads to new opportunities and new research questions, such as:
    - the application of corpus analysis methods to discourse: what kind of
    data can be regarded as relevant at this level of linguistic investigation?
    - the development of novel ways of accessing documents, which leads to a
    new emphasis on text structure and the potential exploitation of surface
    markers;
    - the impact of new document types on basic concepts in the field:
    cohesion, coherence, metadiscursive signalling.

    This workshop on written discourse organisation aims to bring together
    research from three domains which must seek points of convergence in the
    light of these new prospects:

    1. Discourse organisation

    In order to apprehend a sequence of utterances as discourse, it is
    necessary to understand its organisation (to identify its segments and
    perceive their hierarchy and their relations). An old and fertile tradition
    approaches discourse organisation via the notion of discourse relations:
    semantico-pragmatic links between segments (propositions or sets of
    propositions) (cf. Péry-Woodley (ed) 2001). Other modes of organisation may
    be envisaged, via the notion of theme or topic for instance, or more
    recently through the discourse framing hypothesis (Charolles 1997).
    Research in this field can be placed in a continuum from pure “conceptual”
    modelling to empirical methods (automatic segmenting, cf. Hearst 1997;
    shallow analyses human or automatic - cf. Teufel et Moens 1999). The
    challenge is to hold both ends of the continuum in order to draw
    connections between the way “things are put” in texts and the processes
    underlying discourse organisation at different levels of granularity (local
    vs. global organisation). The relationship between modelling approaches and
    empirical research has often seemed problematic, with empirical studies
    running the risk of losing track of structure as they focus on surface
    markers, while conceptual models tend to be difficult to test empirically.
    Corpus-based approaches greatly facilitated by progression into the
    digital age are in the process of bringing considerable changes in the
    discourse field, as they have done elsewhere in linguistics (Conrad 2002).

    2. Corpus-based studies of linguistic correlates of discourse organisation

    As noted by several authors (Biber et al 1998 inter alia), though research
    on discourse organisation tends to make regular use of authentic data, the
    corpus is often seen as a source of examples rather than the object of the
    analysis as such. The implementation of a fully-fledged “corpus approach”
    in the field of discourse organisation carries with it many difficulties:
    corpus construction (common sampling-based techniques make it impossible…),
    the role of quantitative analysis, and most of all definition of relevant
    data making it possible to draw the connection between surface markers
    (which may be just epiphenomena) and the multiple principles underlying
    complex hierarchic organisation.
    A gap can also be observed between linguistic approaches (low coverage and
    high reliability) and numerical approaches (high coverage and low
    reliability). Articulating these approaches may open new prospects, leading
    to fresh insights into discourse organisation principles as well as more
    operational methods for applications.

    3. Computer-based systems for the exploitation of digital documents

    Applications for which the relevant unit is the whole document are little
    concerned by questions of discourse organisation, but those concerned with
    intra-document browsing, selective synthesis or multi-level visualisation
    must work their way inside the documents and therefore cannot consider them
    as simple “bags of words”: they have to take into account the organisation
    into thematic or rhetorical chunks and text architecture (cf. Luc & Virbel
    2001). These objectives bring about new research questions, in particular
    around the articulation of different organisational levels in long
    documents (where browsing aids acquire particular relevance).

    This call for papers concerns researchers who are already working on these
    interactions, as well as those whose work is in one of the domains referred
    to but who are interested in a dialogue with other discourse approaches.
    Descriptive studies which pay specific attention to methodology will be
    particularly welcome.

    Some relevant themes (non-exhaustive list):
    - identification of objects or text zones corresponding to text or
    discourse acts (conclusions, explanations, evaluations, …)
    - discourse organisation markers (from markers to relations: inductive
    approach): connection, indexing (discourse frames), textual metadiscourse
    - linguistic characterisation of discourse functions (from functions to
    markers: deductive approach)
    - segmentation (automatic or manual): “topic shifts”, clues to segment
    boundaries (lexico-syntactic, typographical, dispositional)
    - articulation between local and global organisation
    - impact of discourse genre on discourse organisation and its linguistic
    markers
    - analysis and exploitation of document architecture
    - topological approaches
    - discourse annotation

    SUBMISSION (MODALITIES)

    A summary (2-4 pages, Word, pdf or ps) to be e-mailed by January 30th 2004
    to Marie-Paule Péry-Woodley (<pery@univ-tlse2.fr>).

    Notification of acceptance will be given by March 15th 2004.

    ***************************************************************************

    References

    Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics:
    Investigating language structure and use. Cambridge: Cambridge University
    Press.
    Conrad, S. (2002). Corpus linguistics approaches for discourse analysis.
    Annual Review of Applied Linguistics, 22, 75-95.
    Charolles, M. (1997). L'encadrement du discours : Univers, champs, domaines
    et espaces (Cahier de Recherche Linguistique 6): Université de Nancy2.
    Hearst, M. (1997). TextTiling: segmenting text into multi-paragraph
    subtopic passages. Computational Linguistics, 23(1), 33-64.
    Luc, C., & Virbel, J. (2001). Le modèle d'architecture textuelle :
    fondements et expérimentation. Verbum, 23(1), 103-123.
    Péry-Woodley, M.-P. (ed.) (2001). Cohérence et relations de discours à
    l'écrit. Présentation. Verbum, 23(1).
    Teufel S. & Moens, M. (1999). Discourse-level argumentation in scientific
    articles: human and automatic annotation. In: Towards Standards and Tools
    for Discourse Tagging. ACL 1999 Workshop.

    ___
    Marie-Paule PERY-WOODLEY
    ___________________________________________________________________
    ERSS / Sciences du Langage
    Universite de Toulouse Le Mirail Tel.: 33(0)5 61 50 46 76/-36 09
    5 allees Antonio-Machado Fax: 33(0)5 61 50 42 12
    F-31058 TOULOUSE CEDEX Email: pery@univ-tlse2.fr



    This archive was generated by hypermail 2b29 : Mon Jan 19 2004 - 15:30:32 MET