Text Categorization

jdi



Introduction

Jdi tool uses statistical associations between words and JDs, between MHs and JDs, and between SHs and JDs, from a training set of MEDLINE citations. The word-JD scores, Mh-Jd scores, and Sh-Jd scores are pre-calculated and loaded into a database. Jdi takes the inputs, which may be text phrases, MeSH terms, or a combination. Filters are applied to text input, such as word extraction algorithms, stopwords, minimum word length, etc. Then, JDI calculates the average score for all inputs, and sends the ranked JDs with their scores to the output.

Jdi is the core methodology of TC tools. It is used in Sti and Stri program. It is used to categorize text, index contents, retrieve records, and Word Sense Disambiguation.

Set Up

Follow the installation instructions to install text categorization tools and run the jdi program. Check on the following items only if you don't use the provided script to install Text Categorization tools.

  • CLASSPATH:
    1. include the Text Categorization tools distribution jar file, ${TC_DIR}/lib/tc2011dist.jar, in your CLASSPATH.
    2. include the tc top directory in your CLASSPATH.

  • Configuration File: assign the full path of the top directory of tc2011 to a variable named ROOT_DIR in the configuration file, data/Config/tc.properties.

Test Run

Input

jdi takes two types of input:

Output

jdi calculates the average JD scores of the input text for both word counts and document counts, then display the top 10 JD with scores for both count. The top ranked JD by document count are shown at the end as overall JD rank.

RankJD ScoresJD IdJD name

jdi Options

Please refer to design document