Text Categorization

Frequently Asked Questions

(Please read before asking a question)

How can I ask a question?
See Contact Us
Where are command line tools?
The command line tools are under "$TC_DIR/bin":

What is the year in TC database name?
The year of database name represents the year of MEDLINE used to generate TC tables. For tc.2007 and tc.2008, 2004 and 2008 data were used. Please see the table below for the details on data source versions:

TC version	DB name	MEDLINE	Metathesaurus	lsi.xml
tc.2007	tc2004	2004 (99-01)	2003AC	2006
tc.2008	tc2008	2008 (05-07)	2007AC	2007
tc.2009	tc2009	2009 (06-08)	2008AB	2009
tc.2010	tc2010	2010 (07-09)	2009AA	2010
tc.2011	tc2011	2011 (08-10)	2010AB	2011

Can I install TC on Solaris platform?
The installation program of TC package only supports Linux and Windows due to the policy change. However, user may install TC package manually on Solaris or other platform supports Java.
What is the difference between STI and STRI?
STRI is Semantic Types Real-Time Indexing, which uses JDI to index all input words first, and then get the cosine coefficient on the resulting JDI Vector and St-JD vector (from StJd tables). This method is improved by pre-calculating the word-ST vector for all words and load them in the TC database in STI. Accordingly, the results of "one word" input should be identical between STRI and STI. The results of "multiple words" input should be similar (not identical) between STRI and STI. Please refer to STI or STRI for details.
How do I run the TC with previous data set, such as tc.2007 or tc.2008?
After TC.2009, this feature is added and easy to run by following steps:
- Install the data set under ${TC_DIR}
- Run program with specified data set by using -rv:STR option
Please refer to user documents: run other version of data set for details.