MetaMap has long provided an output format known as Machine Output, whose pre-MetaMap08 version is described here. In MetaMap08, we introduced a number of changes to Machine Output outlined here; a full description of the current form of Machine Output format is described here.
Although MetaMap Machine Output is compact, it is made up of Prolog terms, and therefore difficult to parse without using Prolog. We wanted to provide MetaMap users a more convenient and generally understood output format such as XML, which has has become the de facto lingua franca for internet-based information exchange. MetaMap08 therefore introduced XML output, described here.
Recent feedback from users made us realize that the XML tags in our initial version of MetaMap XML output were not sufficiently mnemonic, so with MetaMap2009 V2, we are changing most of the tags to make them more easily understandable. The structure of MetaMap XML output has not changed--only the tags themselves.
The following three tables will present
Note that the second and third tables present exactly the same information, only arranged differently.
Previous tag | Current tag |
<AAs>
<AA> |
unchanged |
<AAExpansion> | <AAExp> |
<AALen> | unchanged |
<AAName> | <AAText> |
<Args> | <CmdLine> |
<CUIs>
<CUI> |
<AACUIs>
<AACUI> |
<CUIConcepts>
<CUIConcept> |
<NegConcepts>
<NegConcept> |
<CWMatchPosE> | <ConcMatchEnd> |
<CWMatchPosS> | <ConcMatchStart> |
<Candidates>
<Candidate> |
unchanged |
<Command> | unchanged |
<DefnLen> | <AAExpLen> |
<InputMatch> | unchanged |
<IsHead> | unchanged |
<IsOverMatch> | unchanged |
<LexMatch> | unchanged |
<Location> | <UttSection> |
<MMOlist>
<MMO> |
<MMOs>
unchanged |
<MapNegScore> | <MappingScore> |
<Mappings>
<Mapping> |
unchanged |
<MatchedWords>
<MatchedWord> |
unchanged |
<MatchMaps>
<MatchMap> |
unchanged |
<NCSpans>
<NCSpan> |
<NegConcPIs>
<NegConcPI> |
<Negations>
<Negation> |
unchanged |
<NegExCUI> | <NegConcCUI> |
<NegExConcept> | <NegConcMatched> |
<NegScore> | <CandidateScore> |
<NegTrigger> | unchanged |
<NegType> | unchanged |
<NTSpans>
<NTSpan> |
<NegTriggerPIs>
<NegTriggerPI> |
<NumAATokens> | <AATokenNum> |
<NumDefnTokens> | <AAExpTokenNum> |
<Options>
<Option> |
unchanged |
<OptName> | unchanged |
<OptValue> | unchanged |
<Phrases>
<Phrase> |
unchanged |
<PMID> | unchanged |
<POS> | <LexCat> |
<PSpanLen> | <PhraseLength> |
<PStartPos> | <PhraseStartPos> |
<PText> | <PhraseText> |
<SeqNo> | <UttNum> |
<Sources>
<Source> |
unchanged |
<Spans>
<Span> |
<ConceptPIs>
<ConceptPI> |
<SpanLen> | <Length> |
<STs>
<ST> |
<SemTypes>
<SemType> |
<StartPos> | unchanged |
<Tags>
<Tag> |
<SyntaxUnits>
<SyntaxUnit> |
<Tokens>
<Token> |
unchanged |
<TWMatchPosE> | <TextMatchEnd> |
<TWMatchPosS> | <TextMatchStart> |
<Type> | <SyntaxType> |
<UMLSConcept> | <CandidateMatched> |
<UMLSCUI> | <CandidateCUI> |
<UMLSPreferred> | <CandidatePreferred> |
<USpanLen> | <UttLength> |
<UStartPos> | <UttStartPos> |
<UText> | <UttText> |
<Utterances>
<Utterance> |
unchanged |
<Variation> | <LexVariation> |
In the "Description" fields below,
a repeating tag will be followed by
"+" if it is mandatory and "*" if not.
Note that the <Candidate> tag appears as
<Candidate>+ within a <Mapping> structure,
but as
<Candidate>* within a <Phrase> structure.
This is correct, because if a mapping is created,
it must necessarily be composed of candidates;
a phrase, however, need not have any associated candidates
if MetaMap is unable to identify any Metathesaurus candidates in it.
Tag | Type | Description |
---|---|---|
<AAs> <AA> |
All the data generated for an Acronym/Abbreviation (AA), consisting of
polymerase chain reaction (PCR). | |
<AACUIs> <AACUI> |
Any CUIs associated with the expansion of the AA. | |
<AAExp> | The expansion of the AA (polymerase chain reaction) | |
<AAExpLen> | The character length of the expansion of the AA (25, because polymerase chain reaction contains 25 characters) | |
<AAExpTokenNum> | The number of tokens in the AA expansion (5, because polymerase chain reaction contains 5 tokens, including two blank tokens) | |
<AALen> | The character length of the AA (3, because PCR contains 3 characters) | |
<AAText> | The AA itself (PCR) | |
<AATokenNum> | The number of tokens in the AA (1, because PCR contains 1 token) | |
<Candidates> <Candidate> |
All the data generated for a candidate concept, including
| |
<CandidateCUI> | The CUI of the candidate concept | |
<CandidateMatched> | The candidate concept matched | |
<CandidatePreferred> | The preferred name of the candidate concept | |
<CandidateScore> | The negative score of the candidate concept; the computation of this value is explained on pp. 5-9 of MetaMap Evaluation. | |
<CmdLine> | All the data about the command used to start MetaMap, consisting of
| |
<Command> | The actual operating-system call used to start MetaMap | |
<ConceptPIs> <ConceptPI> |
The positional information of the concept, consisting of
| |
<ConcMatchEnd> | The position within the concept words of the last matching word | |
<ConcMatchStart> | The position within the concept words of the first matching word | |
<InputMatch> | The input word(s) making up the syntax unit | |
<IsHead> | Yes/no value denoting if the candidate concept includes the head of the phrase containing it | |
<IsOverMatch> | Yes/no value denoting if the candidate concept is an overmatch, i.e., if it contains words on one or both ends that do not match the input text. | |
<Length> | The character length of the string | |
<LexCat> | The lexical category of the syntax unit; the LexCats are adj, adv, aux, compl, conj, det, modal, noun, prep, pron, and verb. | |
<LexMatch> | The lexical item(s) matched by the syntax unit | |
<LexVariation> | The degree of lexical variation between the words in the candidate concept and the words in the phrase; the computation of this value is explained on pp. 2-3 of MetaMap Evaluation. | |
<Mappings> <Mapping> |
A set of candidate concepts making up the mapping for the phrase,
consisting of
| |
<MappingScore> | The negative score of the mapping; the computation of this value is explained on pp. 9-10 of MetaMap Evaluation. | |
<MatchedWords> <MatchedWord> |
The word(s) in the Candidate matching the text | |
<MatchMaps> <MatchMap> |
A data structure representing
| |
<MMOs> <MMO> |
All the XML output generated
for an entire input record or citation, consisting of
| |
<Negations> <Negation> |
All the data generated for a negation, including
| |
<NegConcCUI> | The CUI associated with the negated concept | |
<NegConcepts> <NegConcept> |
The negated concept, consisting of
| |
<NegConcMatched> | The name of the negated concept | |
<NegConcPIs> <NegConcPI> |
The StartPos/Length positional information of the negated concept | |
<NegTrigger> | The negation trigger | |
<NegTriggerPIs> <NegTriggerPI> |
The StartPos/Length positional information of the negation trigger | |
<NegType> | The negation type | |
<Options> <Option> |
The option(s) passed to MetaMap, consisting of
| |
<OptName> | The name of the command-line option | |
<OptValue> | The value of the command-line option (can be null) | |
<Phrases> <Phrase> |
The syntactic subcomponent of the utterance, consisting of
| |
<PhraseLength> | The character length of the phrase | |
<PhraseStartPos> | The 0-based character offset of the phrase, counting from the beginning of the input text | |
<PhraseText> | The text of the phrase | |
<PMID> | The PubMed ID of the citation containing the utterance | |
<SemTypes> <SemType> |
The semantic type(s) of the candidate | |
<Sources> <Source> |
The UMLS vocabulary/ies in which the concept was found | |
<StartPos> | The 0-based character offset of the string, counting from the beginning of the input text | |
<SyntaxType> | The syntactic type of the syntax unit; the SyntaxTypes are adv, aux, compl, conj, det, head, mod, modal, pastpart, prep, pron, punc, shapes, and verb. | |
<SyntaxUnits> <SyntaxUnit> |
The syntactic subcomponent of the phrase, consisting of
| |
<TextMatchEnd> | The position within the phrase words of the last matching word | |
<TextMatchStart> | The position within the phrase words of the first matching word | |
<Tokens> <Token> |
The tokens making up the lexical items | |
<Utterances> <Utterance> |
All the data generated for an utterance, including
| |
<UttLength> | The character length of the utterance | |
<UttNum> | The 1-based numerical position of the utterance within the section | |
<UttSection> | The section type (e.g., title or abstract) of the utterance | |
<UttStartPos> | The 0-based character offset of the utterance, counting from the beginning of the input text | |
<UttText> | The text of the utterance |
Tag | Type | Description |
---|---|---|
<MMOs> <MMO> |
All the XML output generated
for an entire input record or citation, consisting of
| |
<CmdLine> | All the data about the command used to start MetaMap, consisting of
| |
<Command> | The actual operating-system call used to start MetaMap | |
<Options> <Option> |
The option(s) passed to MetaMap, consisting of
| |
<OptName> | The name of the command-line option | |
<OptValue> | The value of the command-line option (can be null) | |
<AAs> <AA> |
All the data generated for an Acronym/Abbreviation (AA), consisting of
polymerase chain reaction (PCR). | |
<AAText> | The AA itself (PCR) | |
<AAExp> | The expansion of the AA (polymerase chain reaction) | |
<AATokenNum> | The number of tokens in the AA (1, because PCR contains 1 token) | |
<AALen> | The character length of the AA (3, because PCR contains 3 characters) | |
<AAExpTokenNum> | The number of tokens in the AA expansion (5, because polymerase chain reaction contains 5 tokens, including two blank tokens) | |
<AAExpLen> | The character length of the expansion of the AA (25, because polymerase chain reaction contains 25 characters) | |
<AACUIs> <AACUI> |
Any CUIs associated with the expansion of the AA. | |
<Negations> <Negation> |
All the data generated for a negation, including
| |
<NegType> | The negation type | |
<NegTrigger> | The negation trigger | |
<NegTriggerPIs> <NegTriggerPI> |
The StartPos/Length positional information of the negation trigger | |
<NegConcepts> <NegConcept> |
The negated concept, consisting of
| |
<NegConcCUI> | The CUI associated with the negated concept | |
<NegConcMatched> | The name of the negated concept | |
<NegConcPIs> <NegConcPI> |
The StartPos/Length positional information of the negated concept | |
<Utterances> <Utterance> |
All the data generated for an utterance, including
| |
<PMID> | The PubMed ID of the citation containing the utterance | |
<UttSection> | The section type (e.g., title or abstract) of the utterance | |
<UttNum> | The 1-based numerical position of the utterance within the section | |
<UttText> | The text of the utterance | |
<UttStartPos> | The 0-based character offset of the utterance, counting from the beginning of the input text | |
<UttLength> | The character length of the utterance | |
<Phrases> <Phrase> |
The syntactic subcomponent of the utterance, consisting of
| |
<PhraseText> | The text of the phrase | |
<SyntaxUnits> <SyntaxUnit> |
The syntactic subcomponent of the phrase, consisting of
| |
<SyntaxType> | The syntactic type of the syntax unit; the SyntaxTypes are adv, aux, compl, conj, det, head, mod, modal, pastpart, prep, pron, punc, shapes, and verb. | |
<LexMatch> | The lexical item(s) matched by the syntax unit | |
<InputMatch> | The input word(s) making up the syntax unit | |
<LexCat> | The lexical category of the syntax unit; the LexCats are adj, adv, aux, compl, conj, det, modal, noun, prep, pron, and verb. | |
<Tokens> <Token> |
The tokens making up the lexical items | |
<PhraseStartPos> | The 0-based character offset of the phrase, counting from the beginning of the input text | |
<PhraseLength> | The character length of the phrase | |
<Candidates> <Candidate> |
All the data generated for a candidate concept, including
| |
<CandidateScore> | The negative score of the candidate concept; the computation of this value is explained on pp. 5-9 of MetaMap Evaluation. | |
<CandidateCUI> | The CUI of the candidate concept | |
<CandidateMatched> | The candidate concept matched | |
<CandidatePreferred> | The preferred name of the candidate concept | |
<MatchedWords> <MatchedWord> |
The word(s) in the Candidate matching the text | |
<SemTypes> <SemType> |
The semantic type(s) of the candidate | |
<MatchMaps> <MatchMap> |
A data structure representing
| |
<TextMatchStart> | The position within the phrase words of the first matching word | |
<TextMatchEnd> | The position within the phrase words of the last matching word | |
<ConcMatchStart> | The position within the concept words of the first matching word | |
<ConcMatchEnd> | The position within the concept words of the last matching word | |
<LexVariation> | The degree of lexical variation between the words in the candidate concept and the words in the phrase; the computation of this value is explained on pp. 2-3 of MetaMap Evaluation. | |
<IsHead> | Yes/no value denoting if the candidate concept includes the head of the phrase containing it | |
<IsOverMatch> | Yes/no value denoting if the candidate concept is an overmatch, i.e., if it contains words on one or both ends that do not match the input text. | |
<Sources> <Source> |
The UMLS vocabulary/ies in which the concept was found | |
<ConceptPIs> <ConceptPI> |
The positional information of the concept, consisting of
| |
<StartPos> | The 0-based character offset of the string, counting from the beginning of the input text | |
<Length> | The character length of the string | |
<Mappings> <Mapping> |
A set of candidate concepts making up the mapping for the phrase,
consisting of
| |
<MappingScore> | The negative score of the mapping; the computation of this value is explained on pp. 9-10 of MetaMap Evaluation. |