[Corpora-List] Re: Dictionary Creation Software

From: Ramesh Krishnamurthy (ramesh@easynet.co.uk)
Date: Wed Sep 18 2002 - 17:03:14 MET DST

  • Next message: maria_rzewuska@mail.ukie.gov.pl: "[Corpora-List] query on terminological ambiguities: parallel/translation/comparable corpus - summary"

    Dear Dr De Lucca

    I have drawn up a checklist from my 15 years experience in corpus-based computational lexicography.
    I hope this helps.

    If you are going to create software for the whole process from raw data to publishing
    of a dictionary/reference book, I think these would be my requirements.
    Every process should be automated to the maximum, with allowance for human intervention
    or input of preferences.

    1. for monolingual dictionaries, a large corpus of L1
    2. for bilingual dictionaries, a large corpus of L1 and L2, with pointers in both directions to find
    suggested equivalent words and phrases
    3. lemmatized frequency lists, to decide which words are important enough to include in the dictionary,
    and which forms are significant, etc
    4. based on the frequency lists, a spelling checker, giving variant spellings
    5. pronunciation, with regional variations; concordanced tone units to hear word pronunciation in context
    6. statistics for regional variations
    7. statistics for genre distribution: is the wordform used in all types of text, or mainly in speech,
    mainly in newspapers, mainly in novels, etc
    8. grammar - wordclass identification, colligation, grammar patterns (valency, complementation, etc);
    with frequencies, regional variations, and genre-distribution
    9. collocation: individual collocates, lexical phrases, etc; with frequencies, regional variations, and genre-distribution
    10. semantics - hypernyms, hyponyms, synonyms (i.e. thesaurus), antonyms
    11. pragmatics - any relevant information
    12. selected examples for each point from 3 onwards; large corpora yield hundreds or thousands of examples, so
    13. spoken data: typical speaker, context, interlocutor, etc
    14. concordancer to allow access to raw data and ability to check the information given from point 3 onwards
    15. automatic cut-and-paste to dictionary or reference book database
    16. customizable database templates for reference books
    17. validation routines to ensure database entry fields contain correct information and are in correct sequence
    18. ability to interrogate database on any field or subfield, to count entries, check that editorial policies have been followed,
    check cross-references, check that examples contain the headword, etc
    19. automatic conversion from database to typesetting formats - columnation, page numbering, headers and footers, widows and orphans, typefaces, etc
    20. progress monitoring - which processes have been completed (e.g. compilation, editing, proofreading), which words have been done, who did them, when, etc

    All the tools should be flexible, to allow users to cater for local variations in any feature, from orthographic form (capitalization, punctuation, contractions, etc)
    to size of field in the databases, etc.

    Best wishes
    Ramesh

    Ramesh Krishnamurthy
    Consultant, Collins Cobuild and Bank of English Corpus;
    Honorary Research Fellow, Centre for Corpus Linguistics, University of Birmingham;
    Honorary Research Fellow, Computational Linguistics Research Group, University of Wolverhampton.

    ----- Original Message -----
    From: delucca@nilc.icmc.usp.br
    To: corpora@hd.uib.no
    Cc: delucca@usp.br
    Subject: [Corpora-List] Dictionary Creation Software

    Dear Colleagues,

    We are a team of researchers in Computational Linguistics and, at the
    present time, we are working on construction software tools for making
    Dictionaries.

    We would like to hearing from those who have experiences with the compiling
    dictionaries
    and vocabularies the following: WHAT you would like, would need, and would
    hope of a Dictionary Creation Software. What type of tools would be essential
    for making dictionaries, vocabularies and other any type of reference work. A
    concordancer? A Spelling Checker? Pronouncing ?

    We look forward to hearing from you with great interest.

    Thank you very much in advance for your advice.

    Sincerely

    J.L. DeLucca, PhD

    Interinstitutional Center for Research and Development in Computational
    Linguistics (NILC)
    Sao Paulo University



    This archive was generated by hypermail 2b29 : Thu Sep 19 2002 - 09:49:42 MET DST