Corpora: KWiCFinder, a free Web Concordancer

From: William H. Fletcher (fletcher@usna.edu)
Date: Mon Apr 23 2001 - 18:46:06 MET DST

  • Next message: Ha Le An: "Corpora: Terminology rich corpora"

    For several years I have been developing KWiCFinder, a PC-based concordancer
    for the Web which conducts a user's search and produces a KWiC concordance
    of the search terms. This program was conceived by a linguist for linguists,
    but it is a powerful research tool for any field, whether one is interested
    in form or content. A stable but incomplete (especially the documentation!)
    preliminary release of this free program is now available for download at

       http://miniappolis.com/KWiCFinder/

    I would appreciate feedback and suggestions from colleagues in the corpus
    community on usefulness and potential improvements to the program.

    KWiCFinder uses the AltaVista search engine. It helps the user formulate a
    query, then downloads documents matching the query and displays Key Word in
    Context excerpts in a variety of formats and languages. Downloaded documents
    can be saved in HTML and/or text formats, so they'll still be there when you
    need them.

    KWiCFinder also offers refinements to narrow the search even further than
    AltaVista's complex Boolean criteria normally allow. It introduces wildcards
    to match a single character (versus AV's *, which matches 0-5 characters),
    and the "sic" option to block lower-case or "plain" characters from matching
    upper-case or accented ones (without the "sic" option, German "wurde" also
    matches "würde", or Spanish "continuo" matches "continúo" and "continuó" as
    well).

    KWiCFinder's "Tamecards" provide a shortcut method of specifying variants
    without matching as many undesired forms as wildcards would, e.g.
       sink[,s,ing]
    expands to
       sink sinks sinking
    and
       s[iau]nk[,s,ing]
    expands to all possible forms of the verb to sink (as well as to the
    nonsense forms such as sanks and sunking). Similarly, "on-line", with the
    implicit tamecard "-", also matches "online" and "on line".

    KWiCFinder distinguishes between "search terms," which appear in the report,
    and "selection criteria," which narrow the search but are not reported on.

    Once a search has been launched, KWiCFinder works in the background, without
    user intervention. It can download and analyze a virtually unlimited number
    of documents sequentially at the rate of 5-20 documents per minute. By
    launching additional instances of the program, one can conduct a number of
    searches simultaneously.

    Search reports are encoded in XML and transformed to HTML for display.
    Consequently the language and format options can be changed after the
    search, and the end-user can even modify and extend them by editing the XSLT
    stylesheets. The XML-based approach also permits documents and citations to
    be annotated, categorized or deleted, and reports from different searches
    can be merged.

    KWiCFinder is still under development. Your observations and suggestions
    will be received with enthusiasm!

    Bill Fletcher

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      William H. Fletcher (410) 293-6362 [voice]
      Associate Professor of German and Spanish (410) 293-2729 [fax]
      Language Studies Department (DSN 281-xxxx)
      US Naval Academy
      589 McNair Road
      Annapolis, MD 21402 - 5030

      fletcher@usna.edu
      http://www.usna.edu/LangStudy/

    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -



    This archive was generated by hypermail 2b29 : Mon Apr 23 2001 - 18:47:44 MET DST