Text Categorization

Release Notes

These release notes summarize new enhanced features and fixed bugs for the most recent releases of Text Categorization Tools Java.

Version 2011

The 2011 release is the 5th version of the Java Text Categorization Tools. This version includes 5 main script programs. They are:

  • mlt (MEDLINE Tokenizer)
  • jdi (Journal Descriptor Indexing)
  • sti (Semantic Type Indexing)
  • stri (Semantic Type Indexing, Real-Time)
  • stWsd (ST Word Sense Disambiguation)

This release includes completing 5 software change requests (SCRs). They are described as follows.

I. Main Feature Enhancements

  • Release Package
    • Distributed with JRE, 1.6.0_21
    • Distributed with HSqlDb 2.0.0 (HyperSonic SQL DB)
    • Provide version information on ${TC_DIR}/data/versions.txt
  • TC Data
    • Released with latest tables from MEDLINE.2011 and Metathesaurus.2010AB
    • Released with latest tables for JDI/STI/STRI
    • Latest update on default value of Max. cutoff for normalized count
    • Compatible to run with data set:
      • tcData.2010
      • tcData.2009
      • tcData.2008
      • tcData.2007
  • New Features
    • Add features for StWsd to take ST abbreviations and TUIs

II. Bug fixes:

  • None