Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Text Categorization

PreProcess: Mh-Sh-Jdid-Dc (from MEDLINE)

  • Description:
    JDI-MeSH is based on the training set from MEDLINE citations. The first step of establish this training set is to get Jdids, JDs, starred MeSH (Mh & Sh) information from MEDLINE. Both MeSH main heading and MeSH subheading have two files. One for total document count and one for Jdid document count. These files are used to calculate the final scores of MH-Jdid-Dc and SH-Jdid-Dc tables.

  • Input:
    MEDLINE training set for tc2007
    • MEDLINE 2004: /nfsvol/indaux/MEDLINE_baseline/2004/medline04n${NUM}.txt
    • Date created (DA) from year: 1999, 2000, 2001
    • ${NUM} are file names of file include citation with DA in years of 1999, 2000, 2001

    • jds.txt
    • jidTaJds.txt
    • shs.txt

  • Java File & Algorithm:
    • GenerateFilesFromMedLine.java:
      • Read in all fields ( PMID, TI, AB, TA, JID, RN, MH) from MedLine citations if DA is within specified range
      • Read in JDs information through JID for each citation
      • Check if DA (created date) is in specified years
      • Check if this citation has JDs
    • Update MH document count and MH-JD document count
    • Update SH document count and SH-JD document count
    • Print out total document count for MH, MH-JDID, SH, SH-JDID, respectively:
      • Sent MH, DC to mhDc.txt
      • Sent MH-JDID, DC to mhJdidDc.txt
      • Sent SH, DC to shDc.txt
      • Sent SH-JDID, DC to shJdidDc.txt

  • Output File:
  • Notes:
    • Make sure all JDs are defined in both files: jds.txt and jidTaJds.txt Otherwise, this program will generate error message when it reach a JD from jidTaJds.txt but not in JDs list.
    • These files are generated along with all other files from MEDLINE.