Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Text Categorization

JDI: Text and MeSH


  • Description:

    Read in text and MeSH and then perform JD indexing based on

    • word frequency count
    • document count for word

    text, MH, and SH are separated by '|'.

  • Inputs:
    • a text and MeSH: text, MH, and SH are separated by '|'
    • a file, such as 9801.2004.TIABMH.in

  • Algorithm:
    • Pre-Process (Input Filter):
      • Separate phrase and MeSH

      • Tokenize all words of the input phrase
      • Apply Word Extraction Filter (if it is MEDLINE TI or AB)
      • Apply acronym filter (TBD)
      • Filter out not legal words
      • Filter out duplicated words if unique flag is true
      • Assign the final words for processing

      • Tokenize SH and MH from the input Meshs
      • Filter out illegal Meshs (not in Mh-Jd Table or Sh-Jd Table
      • Assign legal Meshs
    • Process:
    • Post-process (Output Filter):
      • Print out Input term (text and MeSH)
      • Output filter details
      • Score entries display number
      • No output message
      • Cluster option
      • JD candidates
      • Use alphabetical order for JDs have same score (Ex: "taylor", "assault")

  • Sample commands:
    > jdi -itmh -d -p
    => index a text and MeSHs input from standard input with prompt and detail scores
    
    > jdi -itmh -d -f:ml -i:9801.2004.TIABMH.in -o:9801.2004.TIABMH.out
    => index text and MeSHs from file, 9801.2004.TIABMH.in, use MedLine filter options with detail scores, and send the results to a file, 9801.2004.TIABMH.out
    

  • Sample Outputs:
    • a file, such as 9801.2004.TIABMH.out