Text Categorization

TC Package - Annual Release Procedures

This page describes an annually release procedures for Text Categorization tools package with new set of training data.

  1. Prepare tc${YEAR} baseline
    • Copy tc${PREV_YEAR} to tc${YEAR}
      shell> cp -rp ${TC}/tc${PREV_YEAR} ${TC}/tc${YEAR}
    • Change ${PREV_YEAR} to ${YEAR} in build.html files under ${TC}, ${TC}/examples, ${TC}/install
    • Change ${PREV_YEAR} to ${YEAR} in ${TC}/overview.html
    • Update ${TC}/data/Config/tc.properties, tc.properties.TEMPLATE
      => Try to build with shell> ant release (should be OK to build)
  2. Update Lib/*.jar file
    • Update ${TC}/lib/Other/lvg${YEAR}api.jar
    • Update ${TC}/lvg${YEAR}lite
      => This is needed when run stWsd (unzip from lvg${YEAR}lite.tgz)
    • Update ${TC}/lib/jdbcDrivers/hsqldb.jar
  3. Update Java source code
    • Modify prolog of java files
      • Remove all SCRs-XX from history tag
      • Modify V-${PREV_YEAR} from version tag
        		shell> cd ${LVG}/Components/BaselineCode/bin
        		shell> ModifyTcJavaCode
        		shell> YYYY (${YEAR})
        		shell> 1
        		shell> y
        		

        => build and test to make sure the result is same as last release

  4. Update JDK/JRE
    • Download JDK from SUN
    • Install JDK to /usr/local/Applications/Java
    • Update symbolic link of /usr/bin/java
    • Update symbolic link of /usr/bin/javac
    • Update ${JAVA_HOME} in ~/.cshrc (for javadoc)

    • Update 2 JREs ${TC}/bin/jreDist/
      • Linux
      • windows
  5. Update Installation Program
    • Update ${project.year} in ${TC}/install/build.xml
    • Update ${TC}/install/sources/gov/nih/nlm/nls/tc/install/Setup/Param.java
      • VERSION
      • JRE_DIR
      • DATABASE_NAME
    • Update scripts in ${TC}/install/bin/*
      • TC_YEAR
      • JRE_VERSION
      • CLASSPATH
  6. Update DB
    • Download latest version of HyperSql Db, copy 3 files to ${TC}/lib/jdbcDrivers
      • hsqldb.jar
      • hsqldb_lic.txt
      • hypersonic_lic.txt
  7. Reload data to Database (if it is upgraded)
    • cd /bin/loadDb/
    • Change ${PREV_YEAR} to ${YEAR} in ${TC}/bin/loadDb/1.CreateDb
    • Load to new Database
      • Create DB
      • Load data to database
        • Word-Jd Scores
        • Mh-Jd Scores
        • Sh-Jd Scores
        • Word-St Scores
      • Change readonly=true in tc${YEAR}.properties
      • Check is the result is the same
  8. Integrate with New JDI Training Data
    • Generate JDI dataset, see TC Preprocess procedures
    • Load to new Database
      • Create DB
      • Change hsqldb.cache_file_scale=8 in tc${YEAR}.properties
      • Load data to database
        • Word-Jd Scores
        • Mh-Jd Scores
        • Sh-Jd Scores
      • Change readonly=true in tc${YEAR}.properties
    • Find and update Max. Signal
    • Test JDI similarity between ${YEAR} and ${PRE_YEAR}
  9. Integrate with New STI Training Data
    • Generate STI dataset, see TC Preprocess procedures
    • Use STRI to refine StDocument
    • Load data to database
      • Word-St Scores
    • Test STI and STRI through WSD data collection set
  10. Complete SCRs for ${YEAR} release
    • Update version ${YEAR}
      • {TC_SRC}/Tools/Jdi.java
      • {TC_SRC}/Tools/Sti.java
      • {TC_SRC}/Tools/Stri.java
      • {TC_SRC}/Tools/StWsd.java
      • {TC_SRC}/Tools/Mlt.java
    • Update default value for Max. normalized signal (from observation of file wordSignalWcDcGt1.txt)
      • MAX_SIGNAL in ${TC_SRC}/FilterApi/LegalWordsOption.java
    • Update -rv:YEAR option
      • ${TC_SRC}/Lib/TcSystemOption.java
    • Standardize Java source code
      shell> cd ${LVG}/Components/BaselineCode/bin/
      shell> ModifyTcJavaCode
      ${YEAR}
      2
  11. Update other software components in the package
    • ${TC}/bin
      • Update ${YEAR} in ${TC}/bin/runProg
      • Update SCR_NO in ${TC}/bin/genBuildInfo
    • ${TC}/data
      • Modify ROOT_DIR=AUTO_MODE in ${TC}/data/Config/tc.properties
    • ${TC}/docs
      • Modify ${YEAR} in ${TC}/docs/updateDoc
    • Example Codes
      • Update ${TC_YEAR} in {TC_EXAMPLE}/bin/runExample
  12. Installation Test
    • Update ${TC}/install/Msg/jdiGold.txt (for the new results)
      => This needs to be done after the new database is reloaded.
  13. Compile & Pack
    • shell> cd ${TC}
    • Update ${TC}/genBuildInfo

    • shell> ant clean
    • shell> ant release
    • shell> cd ..
    • shell> gtar -czvf tc${YEAR}.tgz tc${YEAR}
  14. Update web site
    • Update documents
      • change ${PRE_YEAR} to ${YEAR} on ${TC_WEB}/${YEAR}/Home/topMenu*.html
      • change ${PRE_YEAR} to ${YEAR} on ${TC_WEB}/${YEAR}/web/interactiveTools.html
  15. Test
    • Install TC.{YEAR} to ${PROJECTS}
      • shell> mv tc.{YEAR}.tgz to ${PROJECTS}/TC
      • shell> gtar -xzvf tc.{YEAR}.tgz
      • shell> cd ${PROJECTS}/TC/tc${YEAR}
      • shell> ./install/bin/install_linux
    • Add links to previous data sets
      • ln -sf /export/home/lu/Development/TC/tcData/data.2007 data.2007
      • ln -sf /export/home/lu/Development/TC/tcData/data.2008 data.2008
      • ln -sf /export/home/lu/Development/TC/tcData/data.2009 data.2009
      • ln -sf /export/home/lu/Development/TC/tcData/data.2010 data.2010
      • ln -sf ./data data.2011
    • Test all programs for all datasets
      • Test all programs: jdi, sti, stri, stWst, mlt
      • Test all versions: -rv:${YEAR}
    • Test JDI dataset similarity
    • Test STI by WSD collection data set
  16. Update Web Applications
    • Web Tools:
    • TCAT: