Re: [Corpora-List] Re: Dictionary Creation Software

From: Leonel Ruiz Miyares (Centro Ling. Aplicada) (leonel@lingapli.ciges.inf.cu)
Date: Thu Sep 19 2002 - 13:23:18 MET DST

  • Next message: Tolkin, Steve: "RE: [Corpora-List] English Newspaper Corpora"

    On 18 Sep 02, at 16:03, Ramesh Krishnamurthy wrote:

    > Dear Dr De Lucca
    >
    > I have drawn up a checklist from my 15 years experience in
    > corpus-based computational lexicography. I hope this helps.
    >
    > If you are going to create software for the whole process from raw
    > data to publishing of a dictionary/reference book, I think these would
    > be my requirements. Every process should be automated to the maximum,
    > with allowance for human intervention or input of preferences.
    >
    > 1. for monolingual dictionaries, a large corpus of L1
    > 2. for bilingual dictionaries, a large corpus of L1 and L2, with
    > pointers in both directions to find suggested equivalent words and
    > phrases 3. lemmatized frequency lists, to decide which words are
    > important enough to include in the dictionary, and which forms are
    > significant, etc 4. based on the frequency lists, a spelling checker,
    > giving variant spellings 5. pronunciation, with regional variations;
    > concordanced tone units to hear word pronunciation in context 6.
    > statistics for regional variations 7. statistics for genre
    > distribution: is the wordform used in all types of text, or mainly in
    > speech, mainly in newspapers, mainly in novels, etc 8. grammar -
    > wordclass identification, colligation, grammar patterns (valency,
    > complementation, etc); with frequencies, regional variations, and
    > genre-distribution 9. collocation: individual collocates, lexical
    > phrases, etc; with frequencies, regional variations, and
    > genre-distribution 10. semantics - hypernyms, hyponyms, synonyms (i.e.
    > thesaurus), antonyms 11. pragmatics - any relevant information 12.
    > selected examples for each point from 3 onwards; large corpora yield
    > hundreds or thousands of examples, so 13. spoken data: typical
    > speaker, context, interlocutor, etc 14. concordancer to allow access
    > to raw data and ability to check the information given from point 3
    > onwards 15. automatic cut-and-paste to dictionary or reference book
    > database 16. customizable database templates for reference books 17.
    > validation routines to ensure database entry fields contain correct
    > information and are in correct sequence 18. ability to interrogate
    > database on any field or subfield, to count entries, check that
    > editorial policies have been followed, check cross-references, check
    > that examples contain the headword, etc 19. automatic conversion from
    > database to typesetting formats - columnation, page numbering, headers
    > and footers, widows and orphans, typefaces, etc 20. progress
    > monitoring - which processes have been completed (e.g. compilation,
    > editing, proofreading), which words have been done, who did them,
    > when, etc
    >
    > All the tools should be flexible, to allow users to cater for local
    > variations in any feature, from orthographic form (capitalization,
    > punctuation, contractions, etc) to size of field in the databases,
    > etc.
    >
    > Best wishes
    > Ramesh
    >
    > Ramesh Krishnamurthy
    > Consultant, Collins Cobuild and Bank of English Corpus;
    > Honorary Research Fellow, Centre for Corpus Linguistics, University of
    > Birmingham; Honorary Research Fellow, Computational Linguistics
    > Research Group, University of Wolverhampton.
    >
    >
    > ----- Original Message -----
    > From: delucca@nilc.icmc.usp.br
    > To: corpora@hd.uib.no
    > Cc: delucca@usp.br
    > Subject: [Corpora-List] Dictionary Creation Software
    >
    > Dear Colleagues,
    >
    > We are a team of researchers in Computational Linguistics and, at the
    > present time, we are working on construction software tools for making
    > Dictionaries.
    >
    > We would like to hearing from those who have experiences with the
    > compiling dictionaries and vocabularies the following: WHAT you would
    > like, would need, and would hope of a Dictionary Creation Software.
    > What type of tools would be essential for making dictionaries,
    > vocabularies and other any type of reference work. A concordancer? A
    > Spelling Checker? Pronouncing ?
    >
    > We look forward to hearing from you with great interest.
    >
    > Thank you very much in advance for your advice.
    >
    > Sincerely
    >
    >
    >
    >
    > J.L. DeLucca, PhD
    >
    > Interinstitutional Center for Research and Development in
    > Computational Linguistics (NILC) Sao Paulo University



    This archive was generated by hypermail 2b29 : Thu Sep 19 2002 - 13:43:22 MET DST