Text Categorization

What is new?

The Text Categorization tool 2011 version is the 5th official public release. It was developed in pure Java, capable of handling UTF-8. Bellows are some specifications of this tool.

System

  • Upgrade to Java 1.6.0.21
  • Upgrade to HSqlDb 2.0.0
  • Provides scripts for command line tools

Data

  • Used MEDLINE.2011 for citations created in years of 2008, 2009, 2010
  • Used Metathesaurus.2010AB
  • Used lsi2011.xml
  • Used the latest data set for JDI, STI, and STRI
  • Updated the default value of Mac. normalized count
  • Compatible to run with data set of:
    • tcData.2010
    • tcData.2009
    • tcData.2008
    • tcData.2007

Features

  • Add new features in StWsd to take ST abbreviations and TUI as St candidates