Text Categorization

stWsd



Introduction

StWsd tool applies the Sti/Stri to disambiguate ambiguous words. Sti uses the context (phrase or sentences/s) to disambiguate an ambiguous word whose meanings represent different semantic types called ST candidates. The ST candidate with the highest score/rank is presumed to be correct. stDocuments and Journal Descriptor Indexing (JDI) scores are the two main elements of STI scores. A stDocuments is a set of one-word Metathesaurus strings associated with an ST. JDI is a sophisticated methodology with consistent with consistent results for categorizing input text according to biomedical specialties, known as JDs. An optimal St document contains words which best represent the ST; the better the representation, the better the STI result. A new methodology is developed to enhance St documents to achieve better precision of WSD.

StWsd tool provides easy interface for users to find the best sense (ST) of an ambiguous word from given St candidates for a phrase or sentence(s). In shorts, three inputs are required to run StWsd:

  • Ambiguous word
  • ST candidates (possible sense in ST)
  • context (phrase or sentence/s)
It also provides other options, such as use ambiguous sentences if the input is a paragraph, show details, etc.

Set Up

Follow the installation instructions to install text categorization tools and run the sti program. Check on the following items only if you don't use the provided script to install Text Categorization tools.

  • CLASSPATH:
    1. include the Text Categorization tools distribution jar file, ${TC_DIR}/lib/tc2011dist.jar, in your CLASSPATH.
    2. include the TC top directory in your CLASSPATH.

  • Configuration File: assign the full path of the top directory of tc2011 to a variable named ROOT_DIR in the configuration file, data/Config/tc.properties.

Test Run

Input

StWsd take text as input:

Output

StWsd calculates the combined STI scores of the input text for both word counts and document counts and sent the higher rank ST from the ST-candidates to output. If detail flag, -d, is used, the results include filtering details of final words, ST scores in following format:

RankST ScoresST abbreviationST name

StWsd Options

Please refer to design document