Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Text Categorization

PreProcess: JDs (Journal Descriptors)

  • Description:
    Journal Descriptors are preferred Mesh terms that describe journals. Each Journal has a ID, called JID. Each JID is related to certain (one or more) JDs. In the lisp system, 122 Journal Descriptors (JD) are in jd-abbr-table (Preferred Mesh Terms). This information is included in the List of Serials Indexed file (lsi${YEAR}.xml). This file is derived from lsi${YEAR}.xml since 2007 release.

  • Input:
    • ftp://ftp.nlm.nih.gov/online/journals/lsi2007.xml
    • jds.txt (from previous version)

  • Java File & Algorithm:
    • GenerateJidTaJdsFromLsi.java
      • parse lsi.xml file
      • Find xml tag <NlmUniqueID> for Journal ID, JID
      • Find xml tag <MedlineTA> for Journal Title, TA
      • Find xml tag <BroadJournalHeading> for Journal Descriptors, JDs
      • Find xml tag <BroadJournalHeadingList> for the beginning of JDs
      • print out information in the new format to file: jidTaJds.out
      • print out information in the new format to file: jds.txt

  • Output File:
    jds.txt, used in TC.JDI and TC.STRI
    IndexJD IdJD NameStatus
  • Notes:
    • Journal descriptors changed every year.
    • The file is sorted by the order of JD ID (version, then alphabetically)
    • Status: Active, Inactive
    • There are difference in JDs between versions:
      • Susanne's file (used in 2004 training set) & lsi2006.xml:
        jd-abbr-tablelsi2006.xmlNotes
        Anthropology, PhysicalAnthropology 
        AntibioticsAnti-Bacterial Agents 
        BehaviorBehavioral Sciences 
        Delivery of Health CareHealth Services 
        Family PlanningFamily Planning Services 
        Genetics, Behavioral
        • Behavioral Sciences
        • Genetics
         
         Library Science 
         ResearchNot a valid JD, should be removed
         TuberculosisNot a valid JD, should be removed
      • lsi2006.xml & lsi2007.xml:
        lsi2006.xmllsi2007.xml
        NutritionNutritional Sciences

      • Different JDs will generate different JDI training set and results. We use the similarity on those common JDs to compare results.