Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Text Categorization

jdi



Introduction

Jdi tool uses statistical associations between words and JDs, between MHs and JDs, and between SHs and JDs, from a training set of MEDLINE citations. The word-JD scores, Mh-Jd scores, and Sh-Jd scores are pre-calculated and loaded into a database. Jdi takes the inputs, which may be text phrases, MeSH terms, or a combination. Filters are applied to text input, such as word extraction algorithms, stopwords, minimum word length, etc. Then, JDI calculates the average score for all inputs, and sends the ranked JDs with their scores to the output.

Jdi is the core methodology of TC tools. It is used in Sti and Stri program. It is used to categorize text, index contents, retrieve records, and Word Sense Disambiguation.

Set Up

Follow the installation instructions to install text categorization tools and run the jdi program. Check on the following items only if you don't use the provided script to install Text Categorization tools.

  • CLASSPATH:
    1. include the Text Categorization tools distribution jar file, ${TC_DIR}/lib/tc2011dist.jar, in your CLASSPATH.
    2. include the tc top directory in your CLASSPATH.

  • Configuration File: assign the full path of the top directory of tc2011 to a variable named ROOT_DIR in the configuration file, data/Config/tc.properties.

Test Run

Input

jdi takes two types of input:

Output

jdi calculates the average JD scores of the input text for both word counts and document counts, then display the top 10 JD with scores for both count. The top ranked JD by document count are shown at the end as overall JD rank.

RankJD ScoresJD IdJD name

jdi Options

Please refer to design document