Table of Contents
MetaMap maps (matches) text (from documents, queries) into concepts from the UMLS Metathesaurus. Text is taken through a series of modules and broken down into the components that include sentences, phrases, lexical elements and tokens. Variants are generated from the resulting phrases, and candidate concepts from the UMLS Metathesaurus are retrieved and evaluated against their phrases. The resulting concepts are organized in such a way as to best cover the text, known as a final mapping.
MetaMap Options:
MetaMap is highly configurable, and its behavior is controlled by option flags, each of which has a short name (e.g., -I) and a long name (e.g., --show_cuis).
File Options:
When MetaMap is run on the command line, the default input and output are standard input and output. MetaMap allows specifying input and output files on the command line, but the order in which they are specified is important:
The InputFile and OutputFile arguments, if specified, must be the last two arguments. It is not necessary to specify OutputFile, because the output file will default to <InputFile>.out. Note that if the output file (whether specified on the command line or not) is an existing file, the existing file will be overwritten and its original contents lost.
Note For MMTX Users: Please note the difference with MetaMap where there are no option flags for specifying the input and output file names! The input and output file names are specified directly on the command line without options with Metamap and if no filename is specified, MetaMap assumes standard in and out.
Data Options:
Data options determine the underlying vocabularies and data model used by MetaMap.-A (--strict_model)
[default]-C (--relaxed_model)
- determines which data model is used. If more than one model is specified, the strictest one is used; if none is specified, then the strict model is used. See the report Filtering the UMLS Metathesaurus for MetaMap at the SKR website here (under "Technical Documents") for a description of the models.
-V (--mm_data_version) <data version>
- specifies which version of MetaMap's data files will be used for
processing. For Example,
2004
specifies ones of the UMLS 2004 models,2004_level0
specifies one of the level 0 UMLS 2004 models. NOTE: because "normal" processing is the default, this option should very rarely be used.The default data version is:normal
:- All vocabularies in a given AA release of the
Metathesaurus with the exception of the AMA
vocabularies, CPT (Current Procedural Terminology)
and CDT (Current Dental Terminology). Also
excluded from the
normal
data version are CPT and CDT derivative vocabularies such as HCPCS (Healthcare Common Procedure Coding System) and MTHHH (Metathesaurus HCPSCS Hierarchical Terms).
Other data versions that are sometimes available are:level0
:- UMLS vocabularies with the least restrictive source restriction level, namely level0. Even level 0 vocabularies have some copyright restrictions, but they are less restrictive than those with restriction level 1 through 3; and
level0and4
:- Level 0 and level 4 vocabularies. Currently SNOMEDCT and its derivatives are the only level 4 vocabularies in the Metathesaurus. (Note that, despite the numbering, level 4 is not as restrictive as levels 1 through 3, especially for USA users.)
Processing Options:
Processing options control MetaMap's internal behavior.
-@ (--WSD OPTION)
- specifies the hostname running the WSD Server to be used for word-sense disambiguation
-+ (--bracketed_output)
- surrounds the Phrase, Candidates, and Mappings section of
output with ">>>>>" and "<<<<<" brackets. E.g.,
>>>>> Phrase
heart attack
<<<<< Phrase
and similarly for Candidates and Mappings. -8 (--dynamic_variant_generation)
- forces MetaMap to generate variants dynamically rather than by looking up variants in a table. This option is normally used only for debugging purposes.
-a (--all_acros_abbrs)
- allows the use of any acronym/abbreviation variants, which are the least reliable form of variation, because normally at most one of the expansions for an abbreviated form is correct.
-d (--no_derivational_variants)
- prevents the use of any derivational variation in the computation of word variants. This option exists because derivational variants, as opposed to all other forms of variation, always involve a significant change in meaning.
-D (--all_derivational_variants)
- forces the use of all derivational variation, instead of only those between adjectives and nouns. Adjective/noun derivational variants are generally the best derivational variants.
-g (--allow_concept_gaps)
- causes MetaMap to retrieve Metathesaurus candidates with gaps (such as "Unspecified childhood psychosis" for "unspecified psychosis"). This option does not appreciably affect MetaMap's performance. It is appropriate for browsing purposes.
-i (--ignore_word_order)
- allows MetaMap to ignore the order of words in the phrases it processes. MetaMap was originally developed to process full text and consequently depended very strongly on normal English word order. This option avoids the use of specialized word indexes used for efficient candidate retrieval, it ignores word order when matching phrase text to candidate words, and it replaces the normal coverage metric with an involvement metric for evaluating how well a candidate covers the words of a phrase.
-K (--ignore_stop_phrases)
- simply prevents MetaMap from aborting its processing for commonly occurring phrases that are known to produce no mappings. This option is useful only for generating a new table of stop phrases after a change in UMLS data.
-l (--allow_large_n)
- enables retrieval of Metathesaurus candidates for two-character words occurring in more than 2,000 Metathesaurus strings and one-character words occurring in more than 1,000 Metathesaurus strings. This option also allows retrieval for words that can be a preposition, conjunction or determiner.
-o (--allow_overmatches)
- causes MetaMap to retrieve Metathesaurus candidates which have words on one or both ends that do not match the text. For example, overmatches of "medicine" include 'Alternative Medicine', 'Medical Records' and 'Nuclear medicine procedure, NOS'. The use of --allow_overmatches greatly increases the number of candidates retrieved and is consequently much slower than MetaMap without overmatches. It is appropriate for browsing purposes.
-P (--composite_phrases)
- causes MetaMap to construct longer, composite phrases from the
simple phrases produced by the parser. A composite phrase is a simple
phrase followed by any prepositional phrase optionally followed by one
or more of prepositional phrases. An example is "pain on the left side
of the chest" which will map to 'Left sided chest pain' rather than
separate concepts as it would without the option. Note that
--composite_phrases
is experimental; it is currently both inefficient and not completely correct. -Q (--quick_composite_phrases)
- is a version of
--composite_phrases
designed to overcome its inefficiency. It is both experimental and temporary. -S (--tagger OPTION)
- specifies the hostname running the Tagger Server to be used for tagging
-t (--no_tagging)
- causes the tagger to not be called. By default, the SPECIALIST parser will use the results of a tagger to assist in parsing.We previously used the Xerox PARC part of speech tagger but now use the Med-Post/SKR tagger. The MedPost tagger was developed at NCBI specifically for tagging biomedical text; we modified it to use our part of speech tags. NOTE: specifying this option will result in the tagger not being called.
-u (--unique_acros_abbrs_only)
- restricts the generation of acronym/abbreviation variants to those forms with unique expansions. This option produces better results than allowing all forms of acronym/abbreviation variants, but it is still better to prevent all such variants.
-U (--allow_duplicate_concept_names)
- requires that two Concepts' CUIs match (in addition to the Metathesaurus Concept itself and its position in the current phrase) in order for an evaluation to be considered redundant.
-y (--word_sense_disambiguation)
- causes MetaMap to attempt to disambiguate among concepts scoring equally well in matching input text. The initial implementation of MetaMap Word Sense Disambiguation uses a single method that chooses a concept (or concepts) having the most likely semantic type for the context in which the ambiguity arises.
-Y (--prefer_multiple_concepts)
- causes MetaMap to score mappings with more concepts higher than those with fewer concepts. (It does so simply by inverting the normal cohesiveness value.) As a simplified example, with this option in effect, the input text "lung cancer" will score the mapping to the two concepts 'Lung' and 'Cancer' higher than the mapping to the single concept 'Lung Cancer'. This option is useful for discovering higher-order relationships among concepts found in text (e.g., that 'Lung' is the location of 'Cancer' in the example).
-z (--term_processing)
- tells MetaMap to process terms rather than full text. When invoked, MetaMap treats each input as a single phrase (although the parser is still used to determine the head of that phrase). It also causes MetaMap to use the involvement metric rather than coverage for evaluating Metathesaurus candidates When used in conjunction with the --allow_overmatches and --allow_concept_gaps options, it constitutes MetaMap's browse mode for thorough searching of the Metathesaurus. In this case it is wise to also specify -m (--mappings) to toggle mapping construction off; otherwise, MetaMap spends too much time trying to combine the many candidates into final mappings.
Output Options:
Output options control how MetaMap displays results.
-b (--compute_all_mappings)
- forces MetaMap to compute and display all mappings, rather than only the top scoring ones. Note: It is almost never useful to display all mappings because of their large number.
-c (--hide_candidates)
- disables the displaying of the the list of Metathesaurus candidates. By default, candidates are displayed best to worst, according to the MetaMap evaluation metric. Note that (assuming this option is not selected) if a candidate is not the preferred name for a concept, the preferred name is displayed in parentheses immediately following the candidate. Displaying both the matching string and the preferred concept name when they differ is intended to avoid any confusion about why a concept appears on the candidate list. It is generally useful to display both the candidate list and the final mappings.
-e (--exclude_sources) <list>
- excludes those sources in the comma-separated <list> where spaces are not allowed.
-E (--indicate_citation_end)
- causes an end-of-transmission term to be written when processing of each unit of input is complete. It is only useful for processing using the Scheduler and only then with validated generic processing.
-G (--sources)
- displays the Metathesaurus sources for each candidate and mapping in the output.
-I (--show_cuis)
- shows the UMLS CUI for each concept displayed.
-j (--dump_aas)
- displays the Acronyms and Abbreviations discovered by MetaMap
in the following form:
AA|PMID|Acronym|Expansion|#Acronym Tokens|#Acronym
Chars|#ExpansionTokens|#Expansion Chars
-J (--restrict_to_sts) <list>
- restricts output to those concepts with semantic types in the comma-separated <list> where spaces are not allowed.
-k (--exclude_sts) <list>
- excludes concepts having a semantic type in the comma-separated <list> where spaces are not allowed.
-m (--hide_mappings)
- disables the display of mappings. As noted above, it is generally useful to display both the candidate list and the final mappings.
-M (--mmi_output)
- displays in a separate section, the concepts from the highest-scoring mappings and their Semantic Types
-n (--number_the_candidates)
- simply numbers the candidates in a displayed candidate list.
-N --fielded_mmi_output
- displays in a separate section, a ranked list of all the mappings assigned to the text. Additional data such as the PMID of the citation, CUIs, abbreviated Semantic Types are also included.
-O (--show_preferred_names_only)
- prevents MetaMap from displaying both the matching string as well as the preferred name when it displays concepts.
-p (--hide_plain_syntax)
- disables the display of the words forming each phrase, as determined by the SPECIALIST parser.
-q (--machine_output)
- causes output to take the form of Prolog terms rather than human-readable form. The --machine_output option affects all other output options. For further information on machine output, including visually enhanced examples, see the SKR Help page.
-r (--threshold) <integer>
- restricts output to candidates whose evaluation score equals or exceeds the specified threshold. Judicious use of this option can prevent MetaMap from making errors in situations where some input text has no close matches in the Metathesaurus. An appropriate threshold can usually be determined simply by examining MetaMap output for typical text in a given application.
-R (--restrict_to_sources) <list>
- restricts output to those sources in the comma-separated <list>; spaces are not allowed in the list.
-s (--hide_semantic_types)
- disables the display of the semantic types of Metathesaurus concepts. By default, the semantic types of Metathesaurus concepts are displayed in square brackets for each concept in the candidate list and the mappings.
-T (--tagger_output)
- displays the output of the MedPost/SKR tagger lining up input words on one line with their tags on a line below.
-v (--variants)
- displays the variants generated for each input word.
-W (--preferred_name_sources)
- lists all sources for the preferred names of displayed concepts. Note that this is just one of many possible choices for showing sources; showing all sources for any synonym in a concept, for example, would often produce very cluttered output.
-x (--syntax)
- controls the output form of the results of the SPECIALIST parser. It outputs a Prolog term showing details of the syntactic processing.
-X (--truncate_candidates_mappings)
- first truncates the list of candidates to the 100 top-scoring ones before computing mappings and then truncates the list of mappings to the 8 top-scoring ones. This option can sometimes prevent a combinatorial explosion caused by computing a large number of mappings from a large number of candidates as is often encountered when using --allow_overmatches.
-% (--XML) <option>
- generate XML output. options are
format
,noformat
,format1
, andnoformat1
. The optionsformat
andformat1
provide formatted, pretty-printed XML output, whilenoformat
andnoformat1
provide a concise, ununformatted XML output. See ''MetaMap 2009 Release Notes'' (http://metamap.nlm.nih.gov/MM09__Release__Notes.shtml) for more information. --negex
- outputs a list of negated umls concepts occurring in the input and the associated strings that caused the negation.
--no_header_info
- suppresses printing of informational messages at the beginning of a MetaMap session.
--phrases_only
- (for debugging purposes only)
--warnings
- (for debugging purposes only)
Miscellaneous Options:
--help
- displays MetaMap usage, i.e., the form of the command and a list of all options. This option has no short form, and must therefore be invoked as --help.