RE: [Corpora-List] Request for advice on creating a learners' corpus

From: William Gregory Sakas (sakas@hunter.cuny.edu)
Date: Wed Jan 26 2005 - 19:34:49 MET

  • Next message: edina_@scs.carleton.ca: "[Corpora-List] Looking for a French morphological analyzer"

    Hi Victoria,

    You might also want to get in touch with Martin Chodorow
    who has done some work with English corpora of essays
    written by Japanese English-language learners.

    martin.chodorow@hunter.cuny.edu

    Best,
    -- Wm

    William Gregory Sakas, Ph.D.
    Computer Science and Linguistics
    Hunter College and the Graduate Center
    City University of New York
     
    Voice: (212) 772.5211
    Fax: (212) 772.5219
    Email: sakas@hunter.cuny.edu
    Web: http://www.hunter.cuny.edu/cs/Faculty/Sakas/
     
     

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
    Behalf Of Eric Atwell
    Sent: Wednesday, January 26, 2005 11:50 AM
    To: Victoria Muehleisen
    Cc: CORPORA@UIB.NO; Latifa Al-Sulaiti
    Subject: Re: [Corpora-List] Request for advice on creating a learners'
    corpus

    Victoria,

    Latifa Al-Sulaiti was in a similar position about a year and a half ago,
    she planned to collect a million-word Corpus of Contemporary Arabic
    - native-speaker texts rather than learner texts, but even so she faced
    similar technical issues, as her background was in linguistics and
    language teaching rather than computing, and she didnt start with prior
    knowledge about seeking permissions, corpus structure and management,
    XML file format, markup info to add to file headers, etc.
    Her initial version of the corpus is now complete and online;
    see http://www.comp.leeds.ac.uk/latifa

    Her methods and solutions to the problems along the way are documented
    in her MSc Thesis, also online:

    Latifa Al-sulaiti <a
    href="http://www.comp.leeds.ac.uk/cgi-bin/sis/ext/rs_pub.cgi?cmd=displayabst
    ract&amp;sid=200081109">(Abstract)</a>
    (MSc) <br /> <a href="/research/pubs/theses/Latifa_MSc.pdf">Designing
    and Developing a Corpus of Contemporary Arabic</a>

    We are also writing a paper for IJCL; we could let you have a draft if
    you're interested...

    I'm sure Latifa would be happy to discuss issues further - do get in
    touch direct.

    Good luck with your project!

    Eric Atwell, School of Computing, Leeds University

    On Thu, 27 Jan 2005, Victoria Muehleisen wrote:

    > Hello Everyone,
    >
    > I teach English at a university in Japan, and we recently received some
    > grant money to set up a learners' corpus, of students' essays written
    > in English.
    >
    > Although we have some ideas of how we can begin doing research once we
    > have the corpus, we don't know anything about actually setting it up.
    > What are the best formats for storing the essays? For marking up the
    > data? What kind of information will be most useful to add to the
    > files? (For example, we know that we'll want to identify the level of
    > the class the essay was written for--there are basic, intermediate, and
    > advanced level writing courses--and we'll also want to code for the
    > native language of the writer--not all the studehts are Japanese--but
    > are there other kinds of variables we should keep track of?)
    >
    > We would appreciate references to books/articles/web sites on setting
    > up a learners' corpus, especially ones that don't assume too much
    > technical computer knowledge. We'll have people available to help up
    > with the technical side, but we need to tell them what we want to do.
    >
    > In additional to references, if there is anyone who has created a
    > learners' corpus and could warn us about any mistakes to avoid, that
    > would also be very helpful. And at the next stage, we'll need to start
    > thinking about issues of student privacy/permission, so any references
    > on those issues (in particular, ways that other corpus-creators have
    > done it) would be very useful.
    >
    > Thanking you in advance,
    >
    > *********************************
    > Victoria Muehleisen
    >
    > School of International Liberal Studies Waseda University
    > Nishi-Waseda 1-6-1
    > Shinjuku-ku, Tokyo 169-8050
    >
    > E-mail: <vicky@waseda.jp>
    > Home page: <www.f.waseda.jp/vicky>
    >
    >
    >

    -- 
    Eric Atwell, Senior Lecturer, Computer Vision and Language research group,
    School of Computing, University of Leeds, LEEDS LS2 9JT, England
    TEL: +44-113-2335430  FAX: +44-113-2335468  http://www.comp.leeds.ac.uk/eric
    



    This archive was generated by hypermail 2b29 : Wed Jan 26 2005 - 19:36:10 MET