Table of Contents

Introduction

This document outlines a number of enhancements included in Metamap 2009, the most important of which are

  1. NegEx enhancements,
  2. More options for XML generation,
  3. A new sentence-breaking algorithm,
  4. A new input format,
  5. Elimination of Moderate Model, and
  6. Elimination of display_original_phrases command-line option.
Other less visible changes, which will be mentioned but not described further, are various bug fixes involving the exclude_sources option, the display_original_phrase option, the term_processing option, positional information, and Fielded MMI Output. We also added several new Acronym/Abbreviation-detection rules. Finally, we upgraded MetaMap from Berkeley DB 3.0.55 to 4.1.24; this last change is completely transparent to users, but it will require keeping multiple versions of the database files if you want to run both MetaMap09 and any previous release on the same filesystem.

NegEx Enhancements

MetaMap's initial implementation of NegEx was first described in the original MetaMap08 Release Notes; see The NegEx Project Page for a full explanation of NegEx. MetaMap 2009 includes the following enhancements to our initial implementation, all of which have been reviewed with Wendy Chapman.

Previously, NegEx output appeared only in MetaMap Machine Output (generated via the -q flag). Beginning with MetaMap 2009, if the --negex option is specified and default human-readable output is requested, NegEx output (which is computed for the entire citation) will be displayed before the default human-readable output for the citation. For example, the input

No pneumonia. Checked for infiltrates.

will generate the following human-readable output, in which the NegEx output is highlighted:

	    
	      NEGATIONS:
	      Negation Type:     nega
	      Negation Trigger:  no
	      Negation PosInfo:  0/2
	      Negated  Concept:  C0032285:Pneumonia
	      Concept  PosInfo:  3/10

	      Negation Type:     nega
	      Negation Trigger:  checked for
	      Negation PosInfo:  14/11
	      Negated  Concept:  C0332448:Infiltrates
	      Concept  PosInfo:  26/12
	    

	    Processing 00000000.tx.1: No pneumonia. 

	    Phrase: "No pneumonia."
	    Meta Candidates (3):
	    1000 Pneumonia [Disease or Syndrome]
	    907 Lung [Body Part, Organ, or Organ Component]
	    907 Lung (Entire lung) [Body Part, Organ, or Organ Component]
	    Meta Mapping (1000):
	    1000 Pneumonia [Disease or Syndrome]
	    Processing 00000000.tx.2: Checked for infiltrates.

	    Phrase: "Checked"
	    Meta Candidates (1):
	    966 Check (Checking) [Qualitative Concept]
	    Meta Mapping (966):
	    966 Check (Checking) [Qualitative Concept]

	    Phrase: "for infiltrates."
	    Meta Candidates (3):
	    1000 Infiltrates (Infiltration) [Pathologic Function,Therapeutic or Preventive Procedure]
	    966 Infiltrate [Intellectual Product]
	    966 Infiltrate (Administration Method - Infiltrate) [Functional Concept]
	    Meta Mapping (1000):
	    1000 Infiltrates (Infiltration) [Pathologic Function,Therapeutic or Preventive Procedure]
	  

The most useful form of NegEx output, however, remains in Machine Output and especially XML output, presented below.

We also added patf and neop to the list of Semantic Types that can license a negation in order to capture negations in phrases such as no infiltrates (patf) and no cancer (neop). Finally, we added with no evidence of to the list of negation triggers, and the five expressions other than, otherwise, to account for, to explain, and then to the list of scope-decreasing conjunctions.

The two most significant changes to our NegEx implementation, however, are the inclusion of the CUI of negated concepts (as displayed above), and collapsing multiple negations into a single one if multiple concepts are associated with the negated term. For example, in previous versions of MetaMap, the text no abnormality would have generated these negations, in which the Machine-Output terms are pretty-printed for readability:


	    negation(nega,
    	    no, [0/2],
            'ABNORMALITY',[3/12])

	    negation(nega,
            no, [0/2],
            'Abnormality', [3/12])
	  

This output incorrectly suggests that there are two distinct negations, whereas there is of course only one, although the negated term is associated with two concepts. In MetaMap2009, the NegEx output will instead be the following (again, pretty-printed for readability):


	    negation(nega,
            no, [0/2],
            ['C1704258':'Abnormality','C0000768':'ABNORMALITY'], [3/12])
	  

The concepts are displayed as CUI:Concept pairs in a list. (Even if there is only a single concept, that concept will still be displayed in a list, for consistency). The formatted XML version of the above negation/5 term is the following:


	    <Negations Count="1">
	    <Negation>
	    <NegType>nega</NegType>
	    <NegTrigger>no</NegTrigger>
	    <NTSpans Count="1">
	    <Span>
	    <StartPos>0</StartPos>
	    <SpanLen>2</SpanLen>
	    </Span>
	    </NTSpans>
	    <CUIConcepts Count="2">
	    <CUIConcept>
	    <NegExCUI>C1704258</NegExCUI>
	    <NegExConcept>Abnormality</NegExConcept>
	    </CUIConcept>
	    <CUIConcept>
	    <NegExCUI>C0000768</NegExCUI>
	    <NegExConcept>ABNORMALITY</NegExConcept>
	    </CUIConcept>
	    </CUIConcepts>
	    <NCSpans Count="1">
	    <Span>
	    <StartPos>3</StartPos>
	    <SpanLen>12</SpanLen>
	    </Span>
	    </NCSpans>
	    </Negation>
	    </Negations>
	  
The concepts that are marked for negation must have one of the following semantic types:
Semantic TypeAbbrev
Acquired Abnormalityacab
Anatomical Abnormalityanab
Biologic Functionbiof
Congenital Abnormalitycgab
Cell or Molecular Dysfunctioncomd
Disease or Syndromedsyn
Experimental Model of Diseaseemod
Findingfndg
Injury or Poisoninginpo
Laboratory or Test Resultlbtr
Mental Processmenp
Mental or Behavioral Dysfunctionmobd
Neoplastic Processneop
Pathologic Functionpatf
Physiologic Functionphsf
Sign or Symptomsosy

XML Generation

XML generation was first included in the initial release of MetaMap08, as described in the original MetaMap08 Release Notes, and was revised in MetaMap08 V2, as described in the MetaMap08 V2 Release Notes.

After receiving much feedback from users, we are providing in MetaMap09 greater flexibility in the available forms of XML output. Previously, users could request XML output only as either --XML format or --XML noformat (or, equivalently, -% format or -% noformat). The format option generates formatted, pretty-printed, XML, whereas the noformat option generates more compact, unformatted XML that is not human-readable. Regardless of the formatting option chosen, however, previous versions of MetaMap generate one XML document for each input record or citation.

Beginning with MetaMap09, we are providing two additional XML options: format1 and noformat1. At a high level, the behavior of format and noformat will remain the same, in that MetaMap will generate an XML document for each each input record or citation; the only difference will be the new <MMOList> element (see below). The XML output for format1 and noformat1, however, will contain one XML document for each input file.

For example, calling metamap09 -% format1 on the following text


	    Heart attack.

	    Lung cancer.
	  

will generate the XML output


	    <?xml version="1.0" encoding="UTF-8"?>
	    <!DOCTYPE MMO PUBLIC "-//NLM//DTD MetaMap Machine Output//EN"
            "http://ii-public.nlm.nih.gov/DTD/MMOtoXML_v2.dtd">
	    <MMOlist>
	    <MMO>

	    . . . XML output for Heart attack. . . .

	    </MMO>
	    <MMO>

	    . . . XML output for Lung cancer. . . .

	    </MMO>
	    </MMOlist>

	  

Similarly, calling metamap09 -% format on the same text will generate the XML output


	    <?xml version="1.0" encoding="UTF-8"?>
	    <!DOCTYPE MMO PUBLIC "-//NLM//DTD MetaMap Machine Output//EN"
            "http://ii-public.nlm.nih.gov/DTD/MMOtoXML_v2.dtd">
	    <MMOlist>
	    <MMO>

	    . . . XML output for Heart attack. . . .

	    </MMO>
	    </MMOlist>

	    <?xml version="1.0" encoding="UTF-8"?>
	    <!DOCTYPE MMO PUBLIC "-//NLM//DTD MetaMap Machine Output//EN"
            "http://ii-public.nlm.nih.gov/DTD/MMOtoXML_v2.dtd">
	    <MMOlist>
	    <MMO>

	    . . . XML output for Lung cancer. . . .

	    </MMO>
	    </MMOlist>

	  

Sentence-Breaking Algorithm

MetaMap09 now considers the semi-colon (";") as a sentence-breaking character, in addition to the period ("."), the exclamation point ("!"), and the question mark ("?"). These characters are considered sentence boundaries, however, only if they are not immediately preceded by whitespace.

New Input Format

All previous releases of MetaMap required ASCII 10 (newline or linefeed) as a line-terminating character. In addition, input records or citations had to be separated by blank lines. We have extended MetaMap's input logic to allow ASCII 13 (carriage return) as a line-terminating character as well.

Elimination of Moderate Model

Beginning with MetaMap09, we will no longer release a Moderate Model, but only the Strict and Relaxed Models. Previous versions of MetaMap will still retain their Moderate Models. See Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program, 2001 for more information about MetaMap's Strict, Moderate, and Relaxed Models.

Elimination of Original Phrases

The display_original_phrases command-line option (-H) will no longer be supported.