Table of Contents
Introduction
This document outlines a number of enhancements included in Metamap 2009, the most important of which are
- NegEx enhancements,
- More options for XML generation,
- A new sentence-breaking algorithm,
- A new input format,
- Elimination of Moderate Model, and
- Elimination of display_original_phrases command-line option.
NegEx Enhancements
MetaMap's initial implementation of NegEx was first described in the original MetaMap08 Release Notes; see The NegEx Project Page for a full explanation of NegEx. MetaMap 2009 includes the following enhancements to our initial implementation, all of which have been reviewed with Wendy Chapman.
Previously, NegEx output appeared only in MetaMap Machine Output
(generated via the -q flag).
Beginning with MetaMap 2009, if the --negex option is specified
and default human-readable output is requested,
NegEx output
(which is computed for the entire citation)
will be displayed before the default human-readable output for the citation.
For example, the input
No pneumonia. Checked for infiltrates.
will generate the following human-readable output,
in which the NegEx output is
highlighted:
NEGATIONS:
Negation Type: nega
Negation Trigger: no
Negation PosInfo: 0/2
Negated Concept: C0032285:Pneumonia
Concept PosInfo: 3/10
Negation Type: nega
Negation Trigger: checked for
Negation PosInfo: 14/11
Negated Concept: C0332448:Infiltrates
Concept PosInfo: 26/12
Processing 00000000.tx.1: No pneumonia.
Phrase: "No pneumonia."
Meta Candidates (3):
1000 Pneumonia [Disease or Syndrome]
907 Lung [Body Part, Organ, or Organ Component]
907 Lung (Entire lung) [Body Part, Organ, or Organ Component]
Meta Mapping (1000):
1000 Pneumonia [Disease or Syndrome]
Processing 00000000.tx.2: Checked for infiltrates.
Phrase: "Checked"
Meta Candidates (1):
966 Check (Checking) [Qualitative Concept]
Meta Mapping (966):
966 Check (Checking) [Qualitative Concept]
Phrase: "for infiltrates."
Meta Candidates (3):
1000 Infiltrates (Infiltration) [Pathologic Function,Therapeutic or Preventive Procedure]
966 Infiltrate [Intellectual Product]
966 Infiltrate (Administration Method - Infiltrate) [Functional Concept]
Meta Mapping (1000):
1000 Infiltrates (Infiltration) [Pathologic Function,Therapeutic or Preventive Procedure]
The most useful form of NegEx output, however, remains in Machine Output and especially XML output, presented below.
We also added patf and neop to the list of Semantic Types that can license a negation in order to capture negations in phrases such as no infiltrates (patf) and no cancer (neop). Finally, we added with no evidence of to the list of negation triggers, and the five expressions other than, otherwise, to account for, to explain, and then to the list of scope-decreasing conjunctions.
The two most significant changes to our NegEx implementation, however, are the inclusion of the CUI of negated concepts (as displayed above), and collapsing multiple negations into a single one if multiple concepts are associated with the negated term. For example, in previous versions of MetaMap, the text no abnormality would have generated these negations, in which the Machine-Output terms are pretty-printed for readability:
negation(nega, no, [0/2], 'ABNORMALITY',[3/12]) negation(nega, no, [0/2], 'Abnormality', [3/12])
This output incorrectly suggests that there are two distinct negations, whereas there is of course only one, although the negated term is associated with two concepts. In MetaMap2009, the NegEx output will instead be the following (again, pretty-printed for readability):
negation(nega, no, [0/2], ['C1704258':'Abnormality','C0000768':'ABNORMALITY'], [3/12])
The concepts are displayed as CUI:Concept pairs in a list. (Even if there is only a single concept, that concept will still be displayed in a list, for consistency). The formatted XML version of the above negation/5 term is the following:
<Negations Count="1"> <Negation> <NegType>nega</NegType> <NegTrigger>no</NegTrigger> <NTSpans Count="1"> <Span> <StartPos>0</StartPos> <SpanLen>2</SpanLen> </Span> </NTSpans> <CUIConcepts Count="2"> <CUIConcept> <NegExCUI>C1704258</NegExCUI> <NegExConcept>Abnormality</NegExConcept> </CUIConcept> <CUIConcept> <NegExCUI>C0000768</NegExCUI> <NegExConcept>ABNORMALITY</NegExConcept> </CUIConcept> </CUIConcepts> <NCSpans Count="1"> <Span> <StartPos>3</StartPos> <SpanLen>12</SpanLen> </Span> </NCSpans> </Negation> </Negations>
Semantic Type | Abbrev |
---|---|
Acquired Abnormality | acab |
Anatomical Abnormality | anab |
Biologic Function | biof |
Congenital Abnormality | cgab |
Cell or Molecular Dysfunction | comd |
Disease or Syndrome | dsyn |
Experimental Model of Disease | emod |
Finding | fndg |
Injury or Poisoning | inpo |
Laboratory or Test Result | lbtr |
Mental Process | menp |
Mental or Behavioral Dysfunction | mobd |
Neoplastic Process | neop |
Pathologic Function | patf |
Physiologic Function | phsf |
Sign or Symptom | sosy |
XML Generation
XML generation was first included in the initial release of MetaMap08, as described in the original MetaMap08 Release Notes, and was revised in MetaMap08 V2, as described in the MetaMap08 V2 Release Notes.
After receiving much feedback from users, we are providing in MetaMap09 greater flexibility in the available forms of XML output. Previously, users could request XML output only as either --XML format or --XML noformat (or, equivalently, -% format or -% noformat). The format option generates formatted, pretty-printed, XML, whereas the noformat option generates more compact, unformatted XML that is not human-readable. Regardless of the formatting option chosen, however, previous versions of MetaMap generate one XML document for each input record or citation.
Beginning with MetaMap09, we are providing two additional XML options: format1 and noformat1. At a high level, the behavior of format and noformat will remain the same, in that MetaMap will generate an XML document for each each input record or citation; the only difference will be the new <MMOList> element (see below). The XML output for format1 and noformat1, however, will contain one XML document for each input file.
For example, calling metamap09 -% format1 on the following text
Heart attack. Lung cancer.
will generate the XML output
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE MMO PUBLIC "-//NLM//DTD MetaMap Machine Output//EN" "http://ii-public.nlm.nih.gov/DTD/MMOtoXML_v2.dtd"> <MMOlist> <MMO> . . . XML output for Heart attack. . . . </MMO> <MMO> . . . XML output for Lung cancer. . . . </MMO> </MMOlist>
Similarly, calling metamap09 -% format on the same text will generate the XML output
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE MMO PUBLIC "-//NLM//DTD MetaMap Machine Output//EN" "http://ii-public.nlm.nih.gov/DTD/MMOtoXML_v2.dtd"> <MMOlist> <MMO> . . . XML output for Heart attack. . . . </MMO> </MMOlist> <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE MMO PUBLIC "-//NLM//DTD MetaMap Machine Output//EN" "http://ii-public.nlm.nih.gov/DTD/MMOtoXML_v2.dtd"> <MMOlist> <MMO> . . . XML output for Lung cancer. . . . </MMO> </MMOlist>
Sentence-Breaking Algorithm
MetaMap09 now considers the semi-colon (";") as a sentence-breaking character, in addition to the period ("."), the exclamation point ("!"), and the question mark ("?"). These characters are considered sentence boundaries, however, only if they are not immediately preceded by whitespace.
New Input Format
All previous releases of MetaMap required ASCII 10 (newline or linefeed) as a line-terminating character. In addition, input records or citations had to be separated by blank lines. We have extended MetaMap's input logic to allow ASCII 13 (carriage return) as a line-terminating character as well.
Elimination of Moderate Model
Beginning with MetaMap09, we will no longer release a Moderate Model, but only the Strict and Relaxed Models. Previous versions of MetaMap will still retain their Moderate Models. See Effective Mapping of Biomedical Text to the UMLS Metathesaurus: The MetaMap Program, 2001 for more information about MetaMap's Strict, Moderate, and Relaxed Models.
Elimination of Original Phrases
The display_original_phrases command-line option (-H) will no longer be supported.