LexBuild

Close Match - PreProcess

  • GSpell Installation & Configuration
    • Download gspellProjectSrc.jar
    • Open and put it under /usr/local/Applications/GSpell/gspell_V0.0.40.${HOST_NAME}
      => jar -xvf gspellProjectSrc.jar to create nls directory
      => Gspell must be installed at local HD because it used Berkeley DB, which does not support NFS (Network File System). Please see Oracle Documents for details
    • go ${DEV}/LB/GSpell/
    • Create a link ${HOST_NAME}
      ln -sf /usr/local/Applications/GSpell/gspell_V0.0.40.${HOST_NAME} ${HOST_NAME}
      if install for migration-swap, install as ${HOST_NAME} without [-swap[] to save change name back during swap
    • chmod +x ${LOCAL_APP}/GSpell/gspell_V0.0.40.${HOST_NAME}/nls/gspell/install.sh
    • Install it (${LOCAL_APP}/GSpell/gspell_V0.0.40.$HOST_NAME}/nls/gspell/install.sh)
      • shell> cd ${LOCAL_APP}/GSpell/gspell_V0.0.40.${HOST_NAME}/nls/gspell
      • ./install.sh
        ... follow the instruction and hit the return key to continue ...
        ===================================================
        About to add to the jar -> GSpellRegistry.cfg
                             as ->|gspell/config/GSpellRegistry.cfg|
        ===================================================
        About to add to the jar -> GSpellRegistry.cfg
                             as ->|gspell/config/GSpellRegistry.cfg|
        ======================
        Installation Complete
        ======================
        		
    • Set GSpell to Unicode (default after V0.0.40):
      ${LOCAL_APP}/GSpell/gspell_V0.0.40.${HOST_NAME}/nls/gspell/config/GSpellRegistry.cfg
      -u|--unicode|boolean|true|Output in unicode

    • copy ditionary directory from original to here if needed (for migration)

  • Test GSpell
    • Test directory:
      • shell> cd ${GSPELL_DIR}/GSpelltest
    • Setup - create dictionary in gSpell:
      • copy inflVars:
        shell>cp ${LB}/data/WebApp/Outputs/Lexicon/InflVars ${GSPELL_DIR}/GSpelltest/data/inflVars
      • copy inflVarsTemp:
        cp ${LB}/data/WebApp/Outputs/Lexicon/InflVarsTemp ${GSPELL_DIR}/GSpelltest/data/inflVarsTemp
      • Generate input files for gSpell dictionary: (usort):
        shell>${GSPELL_DIR}/GSpelltest/GenerateInFile
        => InflVars.uSort & InflVarsTemp.uSort
      • Create dictionary, LexiconTest:
        ${GSPELL_DIR}/GSpelltest/GSpellIndex {HOST_NAME} LexiconTest ./data/inflVarsTemp.uSort
        => This creates the dictionary as: ${GSPELL_DIR}/gsepll_V0.0.40.${HOST_NAME}/nls/gspell/dictionaries/LexiconTest
    • Test:
      • go to test directory:
        shell> cd ${GSPELL_DIR}/GSpelltest
        • Test closematch:
          shell> GSpellCloseMatch ${HOST_NAME} <dictionary>
          >input a term that is similar to one in the InflVarsTemp
        • Test addIndex:
          shell> GSpellUpdateIndex ${HOST_NAME} <dictionary>
          input a term that is not in the dictionary, then try in closematch to see if it works
        • Notes (Issues):
          => This won't work if gSpell is installed in NFS (centralized filer) because it uses Berkerley DB, which does not support NFS.
          => gSpell can't be static (in Java) when used in multi-threads even with synchronized
      • Test 1 dictionary: GSpellUpdateIndexCloseMatch <dictionary>
      • Test 2 dictionaries: GSpellUpdateIndexCloseMatch2 <dictionary1> <dictionary2>

  • Create dictionaries, LexiconLb and LexiconLbTemp

    • Program: ${LB}/Tools/PostProcessing/GenerateGSpellDic
      
      	> GenerateGSpellDic
      	1
      	1
      	
      • Generate inflVars: gov.nih.nlm.nls.lexBuild.Db.GenerateInflVars
      • Generate inflVars2: flds 1 inflVars | sort -u
      • Create dictionary, LexiconLb: gov.nih.nlm.nls.gspell.GSpell --index --dictionaryName=${DICTIONARY} --inputFile=${INPUT_FILE} --reportTime

    • Program: $LEXBUILD_DIR/Tools/PostProcessing/GenerateGSpellDic
      
      	> GenerateGSpellDic
      	2
      	1
      	
      • Generate inflVarsTemp: gov.nih.nlm.nls.lexBuild.Db.GenerateInflVarsTemp
      • Generate inflVarsTemp2: flds 1 inflVarsTemp | sort -u
      • Create dictionary, LexiconLbTemp: gov.nih.nlm.nls.gspell.GSpell --index --dictionaryName=${DICTIONARY} --inputFile=${INPUT_FILE} --reportTime

  • Modifications in LexBuild
    • Increase heap size of JVM for Tomcat:
      ${WWW}/Tomcat/apache-tomcat-5.5.20/bin/catalina.sh
      add JAVA_OPTS="-Xmx1800m"
    • Update lexBuild configuration for GSpell directory
      ${LB}/LexBuild/data/WebApp/Config/lexBuild.cfg
      update GSPELL_DIR
    • Update $DEV/LB/WebLexBuild/web/WEB-INF/lib/gspellProject.jar