Text Categorization


  • Description:
    This option is used to specify which field to be tokenized by the field tags. Only 1 tag is allowed to specify at one time. Tags includes:

    TIABtitle and abstract, separated by a space
    MHsMeSHs, separated by "|"
    TIABMHstitle abstract|MeSH 1|MeSH 2|...
    ALLPrint out everything, duplicate the input MEDLINE citations
    tag in MEDLINEthe field content of the correspond MEDLINE tag

    Tokenized field contents are print in one line (expect ALL) for one MEDLINE citation. The order of lines is the same as the order of MEDLINE citations in the input file. The order can be sort by PMID by using -s option.

  • Examples:
      > mlt -i:in.data -o:out.data -t:TIABMHs
      Retrieve titles, abstracts, and MeSHs from input file (in.data) and send results to output file (out.data). Each line in the output file is the Title, Abstracts (separated by a space), and MeSHs (separated by "|") from a MEDLINE citation in the same order as input file. The output should look like:
      TI1 AB1|MeSH...
      TI2 AB2|MeSH...