LexBuild

Close Match - PostProcess

  • I. ReIndex and Reload Dictionary
    • Why
      • LexiconLb:
        Indexes of inflVars are updated to LexiconLb when Approve button is pressed. These indexes are kept in memory. A save( ) method is then called to save indexes from memory to file. It takes about 90 sec. to save index from cash to LexiconLb dictionary file (GSpell.V0.0.40). It is too slow for this web application. Thus, the design is not to save to file when approve button is pressed. Instead, we save it automatically once every day by crontab.

      • LexiconLbTemp:
        GSpell does not provide method to remove indexes from existing dictionary. Indexes in LexiconLbTemp need to be removed whenever records are approved. To work around it, we can re-index LexiconLbTemp dictionary and reload it to LexBuild. We do this once every day automatically by crontab.
    • When
      • Manually: save when save button is pressed
      • Manually: reindex and reload when reindex button is pressed
      • Automatically: reindex once every day right after backup process (crontab)
      • Automatically: reload when the first person login after reindex
    • What
      • ReIndex
        • Set status to "1 ReIndexing" to "gspellStatus.txt"
        • Use crontab to reindex both dictionaries
        • Manually press button
        • Regenerate inflVars and inflVarsTemp
        • Unique terms for both files
        • Reindex all terms in both files for both dictionaries, LexiconLb and LexiconLbTemp. Check the owner and group of these two files are chlu/cgsb so that it can be reIndex by daily crontab.
        • Disable GSpell interface features (ReIndex, Reload, Save) during reIndex process by checking the gSpell status file ($LB/data/WebApp/Outputs/PostProc/gSpellStatus.txt)
        • Set status to "2 Ready for reload" to "gspellStatus.txt"
      • Reload
        • The first personal login (event driven) after crontab to reload
        • Set status to "3 Ready for reIndex" to "gspellStatus.txt"
          done in ${LEXBUILD}/Tools/WebScript/ReLoadStatus
          Somehow, this does not work?? However, the crontab to reIndex nightly still work as scheduled by ReIndexDic.
        • call gSpell.reload() to Reload both dictionaries: LexiconLb & LexiconLbTemp
          From time to time, the reload does not work as expected. Restart Tomcat server could resolve this issue
        • Use sudo crontab -e to restart tomcat daily/weekly after Approval and reIndex.

      ${LB}/data/WebApp/Outputs/PostProc/gSpellStatus.txt

      StatusDescriptionProcess
      1ReIndexingCall $LB/Tools/WebScript/ReIndexDic
      2Ready for reloadThe first user log in after reIndex, reload dictionary
      3Ready for reIndexAfter reload

  • II. Save Dictionary
    • Why
      GSpell.update( ) saves new index in the memory. However, if the web server or applications crash. We will lose those indexes in the memory. GSpell.save( ) provides a API to save index from memory to dictionary file. Also, in LexBuild, we can also Reindex and reload the dictionary as described above to prevent losing index.
    • When
      Due to the design of daily automatic reindex and reload, there is no need to save dictionary from memory. However, LexBuiuld provides a feature for SA to manually save index from memory to file.
    • What
      • It takes more than 20 min. to save LexiconLb (850000 terms). Thus, LexBuild uses a separate thread to save to avoid slow web-performance
      • call gSpell.save() to save indexes from memory to file
      • Disable "Approve" button during save process by checking the value of gSpell.isBusy()
      • Disable GSpell related interface features during save process by checking the value of gSpell_.isBusy():
        • Approval -> Creation
        • Approval -> Modification
        • Post-Proc -> GSpell (ReIndex, Reload, Sava features)
      • If the ${LEXICON_OUTPUTS}/InflVars.uSort or ${LEXICON_OUTPUTS}/InflVarsTemp.uSort is empty (0 line), the gSpell dictionary (gSpell Java Object) will never complete the save (always busy). In such case, the disable button will (always) showup after hitting Post-Proc -> GSpell -> Save button.
      • Moreover, if the Apache Tomcat server is reboot when ${LEXICON_OUTPUTS}/InflVarsTemp.uSort is empty (0 line), the gSpell_ is null, thus, the call to gSpell_.isBusy() cause null exception.
      • Thus, we fix the issue by checking if gSpell_ is null. If so, we assign gSpell_.isBusy() to false.