Text Categorization

Frequently Asked Questions

(Please read before asking a question)

  • How can I ask a question?
    See Contact Us

  • Where are command line tools?
    The command line tools are under "$TC_DIR/bin":

  • What is the year in TC database name?
    The year of database name represents the year of MEDLINE used to generate TC tables. For tc.2007 and tc.2008, 2004 and 2008 data were used. Please see the table below for the details on data source versions:

    TC versionDB nameMEDLINEMetathesauruslsi.xml
    tc.2007tc20042004 (99-01)2003AC2006
    tc.2008tc20082008 (05-07)2007AC2007
    tc.2009tc20092009 (06-08)2008AB2009
    tc.2010tc20102010 (07-09)2009AA2010
    tc.2011tc20112011 (08-10)2010AB2011

  • Can I install TC on Solaris platform?
    The installation program of TC package only supports Linux and Windows due to the policy change. However, user may install TC package manually on Solaris or other platform supports Java.

  • What is the difference between STI and STRI?
    STRI is Semantic Types Real-Time Indexing, which uses JDI to index all input words first, and then get the cosine coefficient on the resulting JDI Vector and St-JD vector (from StJd tables). This method is improved by pre-calculating the word-ST vector for all words and load them in the TC database in STI. Accordingly, the results of "one word" input should be identical between STRI and STI. The results of "multiple words" input should be similar (not identical) between STRI and STI. Please refer to STI or STRI for details.

  • How do I run the TC with previous data set, such as tc.2007 or tc.2008?
    After TC.2009, this feature is added and easy to run by following steps:
    • Install the data set under ${TC_DIR}
    • Run program with specified data set by using -rv:STR option

    Please refer to user documents: run other version of data set for details.