Text Categorization

stri

Stri tool uses the Jdi methodology as its basis. It uses ST (semantic type) documents; an ST document is a set of one-word UMLS Metathesaurus strings belonging to an ST. Stri takes the inputs, which may be text phrases or MeSH terms. Filters are applied to text input, such as word extraction algorithms, stopwords, minimum word length, etc. Then, Stri ranks the STs for an input according to similarity of JDI of the input (result of running Jdi tool on the input in real time) compared to pre-calculated JDI of each ST document, and sends the ranked STs with their scores to the output.

Set Up

Follow the installation instructions to install text categorization tools and run the sti program. Check on the following items only if you don't use the provided script to install Text Categorization tools.

CLASSPATH:
1. include the Text Categorization tools distribution jar file, ${TC_DIR}/lib/tc2011dist.jar, in your CLASSPATH.
2. include the TC top directory in your CLASSPATH.
Configuration File: assign the full path of the top directory of tc2011 to a variable named ROOT_DIR in the configuration file, data/Config/tc.properties.

Test Run

Run java program

Enter the command:


> stri -p
- Please input a term (type "Ctl-d" to quit) >
heart valve
--> Input: [heart valve]
--- ST scores (x 1) and rank based on word count ---
clna|T201|Clinical Attribute
1|0.5940|clna|T201|Clinical Attribute
2|0.5599|spco|T082|Spatial Concept
3|0.5189|patf|T046|Pathologic Function
4|0.4769|drdd|T203|Drug Delivery Device
5|0.4528|medd|T074|Medical Device
6|0.3861|fndg|T033|Finding
7|0.3530|ftcn|T169|Functional Concept
8|0.2909|diap|T060|Diagnostic Procedure
9|0.2907|clas|T185|Classification
10|0.2870|sosy|T184|Sign or Symptom
--- ST scores (x 1) and rank based on document count ---
clna|T201|Clinical Attribute
1|0.7183|clna|T201|Clinical Attribute
2|0.6026|spco|T082|Spatial Concept
3|0.5711|patf|T046|Pathologic Function
4|0.5068|drdd|T203|Drug Delivery Device
5|0.5061|medd|T074|Medical Device
6|0.4524|fndg|T033|Finding
7|0.3934|ftcn|T169|Functional Concept
8|0.3563|sosy|T184|Sign or Symptom
9|0.3522|diap|T060|Diagnostic Procedure
10|0.3353|clas|T185|Classification
--- Overall ST rank ---
clna|T201|Clinical Attribute|dc

where:

stri: Stri script to run Stri Java class
-p: set Stri system option to show prompt (try -h option!)

Input

Sti take text as input:

Output

Stri calculates the average ST scores of the input text for both word counts and document counts and sent the top rank ST to output. If detail flag, -d, is used, the results include rank, ST scores in following format:

Rank	ST Scores	ST abbreviation	ST name

Stri Options

Please refer to design document