Text Categorization

mlt



Introduction

Mlt tool is designed to tokenize fields from MEDLINE citations. Title, Abstract, and MH (starred MHs and SHs only) fields, and combinations of these are routinely tokenized and extracted from a MEDLINE citation. Other fields may be specified for tokenization as well.

Set Up

Follow the installation instructions to install text categorization tools and run the mlt program. Check on the following items only if you don't use the provided script to install Text Categorization tools.

  • CLASSPATH:
    1. include the Text Categorization tools distribution jar file, ${TC_DIR}/lib/tc2011dist.jar, in your CLASSPATH.
    2. include the TC top directory in your CLASSPATH.

  • Configuration File: assign the full path of the top directory of tc2011 to a variable named ROOT_DIR in the configuration file, data/Config/tc.properties.

Test Run

Input

Three inputs must be specified when run mlt:

Output

Each field will be sent to output and separated by line separator.

Mlt Options

Please refer to design document