LexBuild

LexBuild - Close Match - GSpell

I. Overview

GSpell is used to provide close matchafunction in the LexBuild. GSpell must be installed in lcoal HD (not LHC filer), such as in /usr/local/Application/GSpell. Close matched terms are indexed in the gSpell dictionaries:

  • LexiconLb: includes all approved base/inflVars of LEXICON
  • LexiconLbTemp: includes all submitted (unapproved) base/inflVars of LEXICON

New terms need to be indexed and added to the dictionary for the close match function. This processes is done when a new created term is approved and submitted (created) in LexBuild. However, there is no deletion for those terms are deleted from the LEXICON. Also, the indexing process takes too long (20 ~ 30 min.) as a real-time operation when new term is added. Thus, the indexing process is designed as:

  • update indexing and dictionary nightly
  • use crontab to call ${LB_TOOLS}/LoadDb/AutoApprove at 3:00 AM
  • use crontab to call ${LB_TOOLS}/WebSCript/ReIndexDic at 3:20 AM
    • genearte inflVars and inflVarsTemp
    • sort fields 1 in inflVars and inflVarsTemp
    • index dictionaries of LexiconLb and LexiconLbTemp
    • change owner to "chlu" and group to "cgsb"

    • update gspell status to gSpellStatus.txt as [2 Ready for reload], the gSpell will reload when the next user login. However, this gSpell function is not stable. Thus, this process is replaced by restarting Tomcat server afterwwards.
  • use crontab (sudo) to restart tomcat server at: 4:00 AM to reload the gSpell

II. Detail Descriptions

Bellows are the detail processes of close match function in LexBuild using GSpell installation.

  • Pre-process
    • GSpell Installation and Configuration
      => Must installed at /usr/local/Applications
    • Test GSpell
    • Create dictionary, LexiconLb and LexiconLbTemp
    • Modifications in:
      • LexBuild:
        • ${TOMCAT}/webapps/WebLexBuild/WEB-INF/web.xml => configFile
        • ${LB_DIR}/data.${HOSTNAME}/WebApp/Config/lexBuild.cfg => GSPELL_DIR
      • Tomcat: /etc/tomcat/tomcat.conf
        => JAVA_OPTS="-Xmx1700m", to limit the max.heap size so it does not take too much memomry.

  • Process
    • Find the close match
    • Update dictionaries
    • Implementation
      • Find the exact match (by term)
      • Find close match
      • Display matches
      • Add index to LexiconLb and LexiconLbTemp dictionaries in GSpell
        => These indexes are save in the memoery

  • Post-Process
    • Reload LexiconLb and LexiconLbTemp dictionaries