JDI Methodology
The NLM (National Library of Medicine®) maintains two broad, relatively small classifications:
JDI uses a methodology based on statistical word-JD associations from a training set of MEDLINE citations to which are imported the JDs corresponding to journal unique identifiers in the citations. For example, words in articles in the Journal of Pediatric Surgery become statistically associated with the JDs Pediatrics and Surgery. Then an input text comprised of words similar to the ones in these articles would be categorized by the same JDs. Using words in the input, JDI ranks the JDs according to the average of JD scores in word-JD associations. For example, the first three JDs, with scores, returned by JDI for the input "appendectomy in children" are: 1 0.7311 Surgery, 2 0.6856 Pediatrics, and 3 0.4661 Gastroenterology.
The JDI methodology is the basis for STI (Semantic Type Indexing). ST "documents" are created comprised of UMLS Metathesaurus strings belonging to the ST, and these documents each undergo JDI. Then statistical word-ST associations are calculated by comparing JDI of individual training set words and JDI of these ST documents. Using words in the input, STI ranks the STs according to the average of ST scores in word-ST associations. For example, the first three STs, with scores, returned by STI for the input "appendectomy in children" are: 1 0.5985 Age Group, 2 0.5520 Finding, and 3 0.5498 Therapeutic or Preventive Procedure. That is, the average Age Group score for words in the input is higher than for other STs. An alternate method of STI compares the JDI of the input to the JDI of each ST document, and ranks the STs according to the greatest similarity to their ST documents. By this method, JDI of this input is most similar to JDI of the Age Group document.
Web-based tools for performing JDI and STI have been developed in JAVA as part of the TC (Text Categorization) project.
JDI and STI have actual and potential applications, in particular embedded in programs in the SKR (Semantic Knowledge Representation) project. For example, JDI is being used by SemRep, an NLP program; JDI increases accuracy by identifying MEDLINE citations in the molecular genetics domain before NLP begins. STI has been applied to WSD. If the senses of an ambiguous word are expressed by candidate STs for its meaning, STI can be performed on the context surrounding the word (phrase, sentence, abstract) in the expectation that in the STI of the context, the correct ST for the word will rank higher than the other candidate STs. STI is being evaluated to do WSD in MetaMap.
Go to References for links to full-text papers explaining the JDI methodology.