MetaMap 2007 Usage

Home

Default Options

File Options

Data Options

Processing
Options

Output Options

Miscellaneous
Options
      Usage: metamap07 [Options] [Input File] [Output File]

MetaMap maps (matches) text (from documents, queries) into concepts from the UMLS Metathesaurus. Text is taken through a series of modules and broken down into the components that include sentences, phrases, lexical elements and tokens. Variants are generated from the resulting phrases, and candidate concepts from the UMLS Metathesaurus are retrieved and evaluated against their phrases. The resulting concepts are organized in such a way as to best cover the text, known as a final mapping.

MetaMap Options:

PLEASE NOTE: This is important for using the MetaMap options correctly!
On the command line most of the options are toggle switches. Specifying a non-default option toggles it on; specifying a default option toggles it off. Options that take an argument are never defaults, so their presence always indicates that they are in effect. (Excerpted from
MetaMap: Mapping Text to the UMLS Metathesaurus, July 2006  (PDF - 280 kb))

MetaMap is highly configurable, and its behavior is controlled by option flags each of which has a short name (e.g., -p) and a long name (e.g., --plain_syntax).

Default Options: MetaMap's default behavior consists of the following options.
-a (--no_acros_abbrs)
-b (--best_mappings_only)
-c (--candidates)
-l (--stop_large_n)
-m (--mappings)
-p (--plain_syntax)
-s (--semantic_types)
-t (--tag_text)
-D (--an_derivational_variants)
Remember: Specifying default options on the command line will turn them off.

File Options: If you run MetaMap on the command line, the default input and output are standard input and output. MetaMap does allow for specifying the input and output files on the command line, but the order in which they are specified is important!
% metamap07 [Options] InputFile OutputFile
Note For MMTX Users: Please note the difference with MetaMap where there are no option flags for specifying the input and output file names! The input and output file names are specified directly on the command line without options with Metamap and if no filename is specified, MetaMap assumes standard in and out.

Data Options: Data options determine the underlying vocabularies and data model used by MetaMap.
-A (--strict_model)
-B (--moderate_model)
-C (--relaxed_model)
determines which data model is used. If more than one model is specified, the strictest one is used; if none is specified, then the strict model is used. See the report Filtering the UMLS Metathesaurus for MetaMap at the SKR website here (under "Technical Documents") for a description of the models.

-V (--mm_data_version) <data version>
specifies which version of MetaMap's data files will be used for processing. For Example, 2004 specifies ones of the UMLS 2004 models, 2004_level0 specifies one of the level 0 UMLS 2004 models.

The default data version is:
normal: All vocabularies in a given AA release of the Metathesaurus with the exception of the AMA vocabularies, CPT (Current Procedural Terminology) and CDT (Current Dental Terminology). Also excluded from the normal data version are CPT and CDT derivative vocabularies such as HCPCS (Healthcare Common Procedure Coding System) and MTHHH (Metathesaurus HCPSCS Hierarchical Terms).

Other data versions that are sometimes available are:
level0: UMLS vocabularies with the least restrictive source restriction level, namely level0. Even level 0 vocabularies have some copyright restrictions, but they are less restrictive than those with restriction level 1 through 3; and
level0and4: Level 0 and level 4 vocabularies. Currently SNOMEDCT and its derivatives are the only level 4 vocabularies in the Metathesaurus. (Note that, despite the numbering, level 4 is not as restrictive as levels 1 through 3, especially for USA users.)

Processing Options: Processing options control MetaMap's internal behavior.
-@ (--force_WSD_server_choice OPTION)
specifies the hostname running the WSD Server to be used for word-sense disambiguation
-+ (--bracketed_output)
surrounds the Phrase, Candidates, and Mappings section of output with ">>>>>" and "<<<<<" brackets. E.g.,
>>>>> Phrase
heart attack
<<<<< Phrase
and similarly for Candidates and Mappings.
-8 (--dynamic_variant_generation)
forces MetaMap to generate variants dynamically rather than by looking up variants in a table. This option is normally used only for debugging purposes.
-a (--no_acros_abbrs)[DEFAULT]
prevents the use of any acronym/abbreviation variants which are the least reliable form of variation because normally at most one of the expansions for an abbreviated form is correct.
-d (--no_derivational_variants)
prevents the use of any derivational variation in the computation of word variants. This option exists because derivational variants, as opposed to all other forms of variation, always involve a significant change in meaning.
-D (--an_derivational_variants) [DEFAULT]
allows the use of derivational variation between adjectives and nouns, hence the name an_derivational variants. Adjective/noun derivational variants are generally the best of the derivational variants.
-g (--allow_concept_gaps)
causes MetaMap to retrieve Metathesaurus candidates with gaps (such as "Unspecified childhood psychosis" for "unspecified psychosis"). This option does not appreciably affect MetaMap's performance. It is appropriate for browsing purposes.
-i (--ignore_word_order)
allows MetaMap to ignore the order of words in the phrases it processes. MetaMap was originally developed to process full text and consequently depended very strongly on normal English word order. This option avoids the use of specialized word indexes used for efficient candidate retrieval, it ignores word order when matching phrase text to candidate words, and it replaces the normal coverage metric with an involvement metric for evaluating how well a candidate covers the words of a phrase.
-K (--ignore_stop_phrases)
simply prevents MetaMap from aborting its processing for commonly occurring phrases that are known to produce no mappings. This option is useful only for generating a new table of stop phrases after a change in UMLS data.
-l (--stop_large_n) [DEFAULT]
prevents retrieval of Metathesaurus candidates for two-character words occurring in more than 2,000 Metathesaurus strings or one-character words occurring in more than 1,000 Metathesaurus strings. This option also prevents retrieval for words that can be a preposition, conjunction or determiner.
-L (--longest_lexicon_match)
causes lexical lookup to prefer matching as much text as possible to lexicon entries. This used to be the only form of lexical lookup, but it has been superseded by a shortest-match algorithm, this is because the SPECIALIST lexicon is a syntactic lexicon; multi-word items contain no more information than their constituents which have their own lexicon entries.
-o (--allow_overmatches)
causes MetaMap to retrieve Metathesaurus candidates which have words on one or both ends that do not match the text. For example, overmatches of "medicine" include 'Alternative Medicine', 'Medical Records' and 'Nuclear medicine procedure, NOS'. The use of --allow_overmatches greatly increases the number of candidates retrieved and is consequently much slower than MetaMap without overmatches. It is appropriate for browsing purposes.
-P (--composite_phrases)
causes MetaMap to construct longer, composite phrases from the simple phrases produced by the parser. A composite phrase is a simple phrase followed by any prepositional phrase optionally followed by one or more of prepositional phrases. An example is "pain on the left side of the chest" which will map to 'Left sided chest pain' rather than separate concepts as it would without the option. Note that --composite_phrases is experimental; it is currently both inefficient and not completely correct.
-Q (--quick_composite_phrases)
is a version of --composite_phrases designed to overcome its inefficiency. It is both experimental and temporary.
-S (--force_tagger_choice OPTION)
specifies the hostname running the Tagger Server to be used for tagging
-t (--tag_text) [DEFAULT]
specifies that the SPECIALIST parser will use the results of a tagger to assist in parsing.We previously used the Xerox PARC part of speech tagger but now use the Med-Post/SKR tagger. The MedPost tagger was developed at NCBI specifically for tagging biomedical text; we modified it to use our part of speech tags.
-u (--unique_acros_abbrs_only)
restricts the generation of acronym/abbreviation variants to those forms with unique expansions. This option produces better results than allowing all forms of acronym/abbreviation variants, but it is still better to prevent all such variants.
-U (--allow_duplicate_concept_names)
requires that two Concepts' CUIs match (in addition to the Metathesaurus Concept itself and its position in the current phrase) in order for an evaluation to be considered redundant.
-y (--word_sense_disambiguation)
causes MetaMap to attempt to disambiguate among concepts scoring equally well in matching input text. The initial implementation of MetaMap Word Sense Disambiguation uses a single method that chooses a concept (or concepts) having the most likely semantic type for the context in which the ambiguity arises.
-Y (--prefer_multiple_concepts)
causes MetaMap to score mappings with more concepts higher than those with fewer concepts. (It does so simply by inverting the normal cohesiveness value.) As a simplified example, with this option in effect, the input text "lung cancer" will score the mapping to the two concepts 'Lung' and 'Cancer' higher than the mapping to the single concept 'Lung Cancer'. This option is useful for discovering higher-order relationships among concepts found in text (e.g., that 'Lung' is the location of 'Cancer' in the example).
-z (--term_processing)
tells MetaMap to process terms rather than full text. When invoked, MetaMap treats each input as a single phrase (although the parser is still used to determine the head of that phrase). It also causes MetaMap to use the involvement metric rather than coverage for evaluating Metathesaurus candidates When used in conjunction with the --allow_overmatches and --allow_concept_gaps options, it constitutes MetaMap's browse mode for thorough searching of the Metathesaurus. In this case it is wise to also specify --mappings to toggle mapping construction off; otherwise, MetaMap spends too much time trying to combine the many candidates into final mappings.
Output Options: Output options control how MetaMap displays results.
-b (--best_mappings_only) [DEFAULT]
restricts mappings displayed to only the top scoring ones. It is almost never useful to display all mappings because of their large number.
-c (--candidates) [DEFAULT]
causes the list of Metathesaurus candidates to be displayed, best to worst, according to the MetaMap evaluation metric. Note that if a candidate is not the preferred name for a concept, the preferred name is displayed in parentheses immediately following the candidate. Displaying both the matching string and the preferred concept name when they differ is intended to avoid any confusion about why a concept appears on the candidate list.
-e (--exclude_sources) <list>
excludes those sources in the comma-separated <list> where spaces are not allowed.
-E (--indicate_citation_end)
This option causes an end-of-transmission term to be written when processing of each unit of input is complete. It is only useful for processing using the Scheduler and only then with validated generic processing.
-f (--fielded_output)
produces multi-line, tab-delimited output. Like machine output, it affects all other output options. For further information on fielded output, including visually enhanced examples, see the SKR Help page.
-G (--sources)
displays the Metathesaurus sources for each candidate and mapping in the output.
-H (--display_original_phrases)
displays the original (unexpanded) text of phrases rather than the expanded form that is produced when acronyms are referred by their definitions. Even if this option is used, it is the expanded form that determines MetaMap's output.
-I (--show_cuis)
shows the UMLS CUI for each concept displayed.
-j (--dump_aas)
display the Acronyms and Abbreviations discovered by MetaMap in the following form:
AA|PMID|Acronym|Expansion|#Acronym Tokens|#Acronym
Chars|#ExpansionTokens|#Expansion Chars
-J (--restrict_to_sts) <list>
restricts output to those concepts with semantic types in the comma-separated <list> where spaces are not allowed.
-k (--exclude_sts) <list>
excludes concepts not having a semantic type in the comma-separated <list> where spaces are not allowed.
-m (--mappings) [DEFAULT]
causes mappings to be displayed. It is generally useful to display both the candidate list and the final mappings.
-M (--mmi_output)
displays in a separate section, the concepts from the highest-scoring mappings and their Semantic Types
-n (--number_the_candidates)
simply numbers the candidates in a displayed candidate list.
-N --fielded_mmi_output
displays in a separate section, a ranked list of all the mappings assigned to the text. Additional data such as the PMID of the citation, CUIs, abbreviated Semantic Types are also included.
-O (--show_preferred_names_only)
prevents MetaMap from displaying both the matching string as well as the preferred name when it displays concepts.
-p (--plain_syntax) [DEFAULT]
controls the output form of the results of the SPECIALIST parser. It simply outputs text without any syntactic information.
-q (--machine_output)
causes output to take the form of Prolog terms rather than human-readable form. The --machine_output option affects all other output options. For further information on machine output, including visually enhanced examples, see the SKR Help page.
-r (--threshold) <integer>
restricts output to candidates whose evaluation score equals or exceeds the specified threshold. Judicious use of this option can prevent MetaMap from making errors in situations where some input text has no close matches in the Metathesaurus. An appropriate threshold can usually be determined simply by examining MetaMap output for typical text in a given application.
-R (--restrict_to_sources) <list>
restricts output to those sources in the comma-separated <list>; spaces are not allowed in the list.
-s (--semantic_types) [DEFAULT]
causes the semantic types of Metathesaurus concepts to be displayed in square brackets for each concept in the candidate list or in a mapping.
-T (--tagger_output)
displays the output of the MedPost/SKR tagger lining up input words on one line with their tags on a line below.
-v (--variants)
displays the variants generated for each input word.
-W (--preferred_name_sources)
lists all sources for the preferred names of displayed concepts. Note that this is just one of many possible choices for showing sources; showing all sources for any synonym in a concept, for example, would often produce very cluttered output.
-x (--syntax)
controls the output form of the results of the SPECIALIST parser. It outputs a Prolog term showing details of the syntactic processing.
-X (--truncate_candidates_mappings)
first truncates the list of candidates to the 100 top-scoring ones before computing mappings and then truncates the list of mappings to the 8 top-scoring ones. This option can sometimes prevent a combinatorial explosion caused by computing a large number of mappings from a large number of candidates as is often encountered when using --allow_overmatches.
Miscellaneous Options:
-h (--help)
displays MetaMap usage, i.e., the form of the command and a list of all options.