Text Categorization

TC Command-line Tools & Java API

The TC package includes Java APIs along with Journal Descriptors Index (JDI), Semantic Type Index (STI), and Semantic Type Index, real-time (STRI), and MEDLINE Tokenizer (MLT). They are briefly described as follows: I. Structure

  • Development dir: ${DEV}/TC/tc${YEAR}
    • bin: command line tools
    • data: all TC data files
    • examples: provides Java sample codes to use TC java APIs
    • install: TC installation programs
    • lib: all jar files
    • sources: Java source codes
II. Directories for Java Source Codes
  • Locates at ${TC}/sources/gov/nih/nlm/nls/tc/Api
    • Api: includes API classes for JDI, STI, STRI, and MLT
    • Db: database interface classes
    • FilterApi: includes API for input filter, output filter, and legal words
    • Jdi: JDI related classes
    • Lib: common used library classes
    • LoadDb: Java classes to load/create database tables
    • MedLine: MEDLINE related classes
    • Sti: STI related classes
    • Stri: STRI related classes
    • Tools: Java classes for command line tools
    • Util: Command used utility classes
II. Naming convention for Java Source Codes

The naming convention follows the Lexical Systems Group, Java coding standard. Below are the brief introductions:

  • Directories:
    • begins with uppercase letter follows by lowercase letters
    • Use first uppercase letter to separate multiple words
    • No '_' should be used.
  • Packages:
    • same as directories
  • Java files:
    • begins with uppercase letter follows by lowercase letters
    • Use first uppercase letter to separate multiple words
    • No '_' should be used.
  • Classes:
    • same as Java files
  • Methods:
    • begins with uppercase letter follows by lowercase letters
    • Use first uppercase letter to separate multiple words
    • No '_' should be used
  • Variables & Parameters:
    • begins with lowercase letter
    • Use first uppercase letter to separate multiple words
    • No '_' should be used
  • Class Data Members:
    • begins with lowercase letter and ends with '_'
    • Use first uppercase letter to separate multiple words
    • No other '_' should be used
  • constants:
    • All uppercase letters
    • Use '_' to separate multiple words