MTI is the main product of the Indexing Initiative project and has been providing indexing recommendations based on the Medical Subject Headings (MeSH®) vocabulary since 2002. In 2011, NLM expanded MTI's role by designating it as the first-line indexer (MTIFL) for a few journals; today the MTIFL workflow includes over 350 journals and continues to increase. The close collaboration of the NLM Index Section, Lister Hill National Center for Biomedical Communications, and Office of Computer & Communications Systems continues to expand and refine the ability of MTI to provide assistance to the indexers.
MTI provides recommendations to the NLM Indexers, MTIFL or MTI First Line indexing partially automates the standard indexing process at the US National Library of Medicine. MTIFL provides the initial indexing for a citation. A human indexer then reviews this indexing and modifies it as required by adding any missed terms, removing any incorrect terms, and supplying Publication Types. The process of the human curation of MTIFL results is called MTIFL Completion. The following link will take you to a more detailed description of MTIFL and a current list of the journals in the MTIFL program. MTIFL Webpage
(324 kb) The NLM Medical Text Indexer System for Indexing Biomedical Literature. J.G. Mork, A. Jimeno Yepes, A.R. Aronson. 2013
This expanded version paper incorporates all of the material from our shorter 2013 BioASQ Workshop paper and also contains unpublished material providing a more comprehensive description of the MTI system.
(503 kb) MTI Processing Flow White Paper
There are several different ways to use MTI depending on your needs and how much data you have to process:
The image says "MTI NLM Medical Text Indexer Providing Indexing Assistance Since 2002" and then has three arrows along the bottom signifying data flow with the titles from left to right being "Biomedical Literature", "MTI/MTIFL", and "MeSH Suggestions."
Here is a link to our general publications webpage and specifically to the section containing all of our MTI related publications.
The Indexing Life Cyle diagram to the right illustrates how MTI/MTIFL fits into the MEDLINE indexing process and assists in enhancing access to the Biomedical Literature via MEDLINE.
The Biomedical Literature is first processed by MTI/MTIFL which provides a set of MeSH Suggestions to the MEDLINE Indexer who then indexes the journal literature providing a detailed summary of the topics in the document. The topics are described using some (or all) of the following:
These components are then added into the MEDLINE citation completing the cycle to aid user searches and ultimately enhance access to the document itself.
Current System (2013):
The NLM Medical Text Indexer (MTI) system is the primary product and focus of the Indexing Initiative. MTI produces both semi- and fully-automated indexing recommendations based on the Medical Subject Headings (MeSH®) controlled vocabulary and has been in use at NLM since 2002. MTI is in daily use to assist Indexers, Catalogers, and NLM's History of Medicine Division (HMD) in their indexing efforts.
Every weeknight MTI provides recommendations for approximately 4,000 new citations for Indexing and processes a mixed file of approximately 7,000 old and new records for both Cataloging and HMD. MTI was also used on a regular basis between 2002 and 2012 to provide fully-automated keyword indexing for NLM's Gateway meeting abstract collection, which was not manually indexed. In 2011, MTI was designated as the First-Line Indexer (MTIFL) for 14 journals (89 in 2013) because of its success with those publications. For MTIFL journals, MTI indexing is treated like human indexing and, of course, subject to the normal manual review process. MEDLINE® Indexers and Revisers consult MTI recommendations for approximately 58% of the articles they index, and the MTI recommendations are tightly integrated into the Cataloging and HMD system. Although mainly used in indexing efforts for processing MEDLINE citations consisting of identifier, title, and abstract, MTI is also capable of processing arbitrary biomedical text.
MTI provides an ordered list of MeSH Main Headings (MH), Subheadings (SH), and CheckTags (CT) as a final result. MHs are the main descriptors or headings from the MeSH Vocabulary (e.g., Lung). SHs are used to qualify the MHs (e.g., Lung/abnormalities means that the article is about the abnormalities associated with the Lung more than the Lung itself), and CTs are a special type of MHs that are required to be included for each article and cover species, sex, human age groups, historical periods, pregnancy, and various types of research support (e.g., Male).
Initial Production System (2002):
The MTI system consists of software for applying alternative methods of discovering MeSH headings for citation titles and abstracts and then combining them into an ordered list of recommended indexing terms. The top portion of the diagram consists of three paths, or methods, for creating a list of recommended indexing terms: MetaMap Indexing, Trigrams and PubMed Related Citations. The first two paths actually compute UMLS Metathesaurus® concepts which are passed to the Restrict to MeSH process. The results from each path are weighted and combined using the Clustering process. The system is highly parameterized not only by path weights but also by several parameters specific to the Restrict to MeSH and Clustering processes.
A prototype MTI system described below had two additional indexing methods which were removed because their results were subsumed by the three remaining methods.
Original Indexing Initiative Prototype System: (~1996):
The Indexing Initiative Prototype System consists of software for applying alternative methods of discovering MeSH headings for citation titles and abstracts and then combining them into an ordered list of recommended indexing terms. The top portion of the diagram consists of five paths, or methods, for creating a list of recommended indexing terms: the INQUERY method, MetaMap Indexing, Barrier Words with Approximate Matching, Trigrams and PubMed Related Citations. The middle three paths actually compute UMLS Metathesaurus® concepts which are passed to the Restrict to MeSH process, and the outer two paths compute MeSH headings directly. The results from each path are weighted and combined using the Clustering process. The system is highly parameterized not only by path weights but also by several parameters specific to the Restrict to MeSH and Clustering processes.