[Corpora-List] Java Document Parsing for BNC

From: David J. Brooks (D.J.Brooks@cs.bham.ac.uk)
Date: Tue Feb 24 2004 - 00:15:42 MET

  • Next message: wiroj kosolritthichai: "[Corpora-List] Seminar on collocations in French"

    Dear List Members,

    NOTE: By "parsing", I mean simply reading a BNC document into the machine,
    not performing syntactic analysis.

    Does anyone have or know of a reliable and easy set of Java libraries for
    parsing British National Corpus documents? I'm after something equivalent
    to the SAX or JAXP XML parsing libraries, that follow (at least to some
    extent) DOM parsing. Ideally, I would like to be able to access all parts
    of a document, not simply the words (and punctuation).

    Thanks in advance,
    David



    This archive was generated by hypermail 2b29 : Tue Feb 24 2004 - 00:30:48 MET