Usage
Usage: metamap14 Options InputFile OutputFile
MetaMap maps (matches) text (from documents, queries) into concepts from the UMLS Metathesaurus. Text is taken through a series of modules and broken down into the components that include sentences, phrases, lexical elements and tokens. Variants are generated from the resulting phrases, and candidate concepts from the UMLS Metathesaurus are retrieved and evaluated against their phrases. The resulting concepts are organized in such a way as to best cover the text, known as a final mapping.
MetaMap Options:
MetaMap is highly configurable, and its behavior is controlled by option flags, each of which has a short name (e.g., -I) and a long name (e.g., --show_cuis).
File Options:
When MetaMap is run on the command line, the default input and output are standard input and output. MetaMap allows specifying input and output files on the command line, but the order in which they are specified is important:
% metamap13 [Options] InputFile OutputFile
The InputFile and OutputFile arguments, if specified, must be the last two arguments. It is not necessary to specify OutputFile, because the output file will default to <InputFile>.out. Note that if the output file (whether specified on the command line or not) is an existing file, the existing file will be overwritten and its original contents lost.
Note For MMTX Users:
Please note the difference with MetaMap where there are no option flags for specifying the input and output file names! The input and output file names are specified directly on the command line without options with Metamap and if no filename is specified, MetaMap assumes standard in and out.
Data Options:
Data options determine the underlying vocabularies and data model used by MetaMap.
- -A (--strict_model)
- -C (--relaxed_model)
- determines which data model is used. If more than one model is specified, the strictest one is used; if none is specified, then the strict model is used. See the report ''Filtering the UMLS Metathesaurus for MetaMap at the SKR website'' (http://skr.nlm.nih.gov/papers/index.shtml) (under "Technical Documents") for a description of the models. [|-V (--mm_data_version)| <data version>] specifies which version of MetaMap's data files will be used for processing. For Example, 2004 specifies ones of the UMLS 2004 models, 2004_level0 specifies one of the level 0 UMLS 2004 models. NOTE: because "normal" processing is the default, this option should very rarely be used.
The default data version is:
- USAbase
- The USAbase data version includes those source vocabularies with no associated restrictions beyond a UMLS license, and free for use for US-based projects; this version includes the Base vocabularies (those with Restriction Category 0), plus the five Category-4 sources and the four Category-9 sources (including, most notably, SNOMEDCT). The USAbase version is a proper superset of the Base version, and might be the most appropriate version for users with a SNOMEDCT license. To repeat: This data version is MetaMap's default, but the default can be overridden by specifying another user-installed data version using the -V flag (See Additional Data Sets.)
IMPORTANT NOTE: Users without SNOMED-CT licenses should use Base data version.
Other data versions that are sometimes available are:
- Base
- The Base data version includes those source vocabularies with no associated licensing restrictions beyond those of the UMLS license; this version includes all and only sources of Restriction Category 0.
- NLM
- All vocabularies in a given AA release of the Metathesaurus with the exception of the AMA vocabularies, CPT (Current Procedural Terminology) and CDT (Current Dental Terminology). Also excluded from the normal data version are CPT and CDT derivative vocabularies such as HCPCS (Healthcare Common Procedure Coding System) and MTHHH (Metathesaurus HCPSCS Hierarchical Terms).
Processing Options:
Processing options control MetaMap's internal behavior.
- --prune <number of candidates>
- The prune options allows the user to specify the maximum number of candidates to use for constructing mappings.
- -@ (--WSD) <option>
- specifies the hostname running the WSD Server to be used for word-sense disambiguation
- -+ (--bracketed_output)
- surrounds the Phrase, Candidates, and Mappings section of output with ">>>>>" and "<<<<<" brackets. E.g.,
>>>>> Phrase heart attack <<<<< Phrase
and similarly for Candidates and Mappings.
- -8 (--dynamic_variant_generation)
- forces MetaMap to generate variants dynamically rather than by looking up variants in a table. This option is normally used only for debugging purposes.
- -a (--all_acros_abbrs)
- allows the use of any acronym/abbreviation variants, which are the least reliable form of variation, because normally at most one of the expansions for an abbreviated form is correct.
- -d (--no_derivational_variants)
- prevents the use of any derivational variation in the computation of word variants. This option exists because derivational variants, as opposed to all other forms of variation, always involve a significant change in meaning.
- -D (--all_derivational_variants)
- forces the use of all derivational variation, instead of only those between adjectives and nouns. Adjective/noun derivational variants are generally the best derivational variants.
- -g (--allow_concept_gaps)
- causes MetaMap to retrieve Metathesaurus candidates with gaps (such as "Unspecified childhood psychosis" for "unspecified psychosis"). This option does not appreciably affect MetaMap's performance. It is appropriate for browsing purposes.
- -i (--ignore_word_order)
- allows MetaMap to ignore the order of words in the phrases it processes. MetaMap was originally developed to process full text and consequently depended very strongly on normal English word order. This option avoids the use of specialized word indexes used for efficient candidate retrieval, it ignores word order when matching phrase text to candidate words, and it replaces the normal coverage metric with an involvement metric for evaluating how well a candidate covers the words of a phrase.
- -K (--ignore_stop_phrases)
- simply prevents MetaMap from aborting its processing for commonly occurring phrases that are known to produce no mappings. This option is useful only for generating a new table of stop phrases after a change in UMLS data.
- -l (--allow_large_n)
- enables retrieval of Metathesaurus candidates for two-character words occurring in more than 2,000 Metathesaurus strings and one-character words occurring in more than 1,000 Metathesaurus strings. This option also allows retrieval for words that can be a preposition, conjunction or determiner.
- -L (--lexicon_year) <year>
- specify which lexicon to use by year.
- -o (--allow_overmatches)
- causes MetaMap to retrieve Metathesaurus candidates which have words on one or both ends that do not match the text. For example, overmatches of "medicine" include 'Alternative Medicine', 'Medical Records' and 'Nuclear medicine procedure, NOS'. The use of --allow_overmatches greatly increases the number of candidates retrieved and is consequently much slower than MetaMap without overmatches. It is appropriate for browsing purposes.
- (--composite_phrases) <integer>
- option causes MetaMap to construct longer, composite phrases from the smaller phrases produced by the parser; The integer operand specifies the number of prepositional phrases that can be glommed onto the initial noun phrase. MetaMap users may experience increased recall with --composite phrases (e.g., -Q 2, -Q 3, or even -Q 4), because it enables the identication of concepts such as Left sided chest pain (C0541828) from the text pain on the left side of the chest.
- -Q (--quick_composite_phrases) <integer>
- is a version of --composite_phrases designed to overcome its inefficiency.
- -S (--TAGGER_SERVER) <hostname>
- specifies the hostname running the Tagger Server to be used for tagging
- -t (--no_tagging)
- causes the tagger to not be called. By default, the SPECIALIST parser will use the results of a tagger to assist in parsing.We previously used the Xerox PARC part of speech tagger but now use the Med-Post/SKR tagger. The MedPost tagger was developed at NCBI specifically for tagging biomedical text; we modified it to use our part of speech tags. NOTE: specifying this option will result in the tagger not being called.
- -u (--unique_acros_abbrs_only)
- restricts the generation of acronym/abbreviation variants to those forms with unique expansions. This option produces better results than allowing all forms of acronym/abbreviation variants, but it is still better to prevent all such variants.
- -y (--word_sense_disambiguation)
- causes MetaMap to attempt to disambiguate among concepts scoring equally well in matching input text. The initial implementation of MetaMap Word Sense Disambiguation uses a single method that chooses a concept (or concepts) having the most likely semantic type for the context in which the ambiguity arises.
- -Y (--prefer_multiple_concepts)
- causes MetaMap to score mappings with more concepts higher than those with fewer concepts. (It does so simply by inverting the normal cohesiveness value.) As a simplified example, with this option in effect, the input text "lung cancer" will score the mapping to the two concepts 'Lung' and 'Cancer' higher than the mapping to the single concept 'Lung Cancer'. This option is useful for discovering higher-order relationships among concepts found in text (e.g., that 'Lung' is the location of 'Cancer' in the example).
- -z (--term_processing)
- tells MetaMap to process terms rather than full text. When invoked, MetaMap treats each input as a single phrase (although the parser is still used to determine the head of that phrase). It also causes MetaMap to use the involvement metric rather than coverage for evaluating Metathesaurus candidates When used in conjunction with the --allow_overmatches and --allow_concept_gaps options, it constitutes MetaMap's browse mode for thorough searching of the Metathesaurus. In this case it is wise to also specify -m (--mappings) to toggle mapping construction off; otherwise, MetaMap spends too much time trying to combine the many candidates into final mappings.
Output Options:
Output options control how MetaMap displays results.
- -b (--compute_all_mappings)
- forces MetaMap to compute and display all mappings, rather than only the top scoring ones. Note: It is almost never useful to display all mappings because of their large number.
- -c (--hide_candidates)
- disables the displaying of the the list of Metathesaurus candidates. By default, candidates are displayed best to worst, according to the MetaMap evaluation metric. Note that (assuming this option is not selected) if a candidate is not the preferred name for a concept, the preferred name is displayed in parentheses immediately following the candidate. Displaying both the matching string and the preferred concept name when they differ is intended to avoid any confusion about why a concept appears on the candidate list. It is generally useful to display both the candidate list and the final mappings.
- -e (--exclude_sources) <list>
- excludes those sources in the comma-separated <list> where spaces are not allowed.
- -E (--indicate_citation_end)
- causes an end-of-transmission term to be written when processing of each unit of input is complete. It is only useful for processing using the Scheduler and only then with validated generic processing.
- -F (--formal_tagger_output)
- Displays the tagging information returned by the tagger server as Prolog terms.
- -G (--sources)
- displays the Metathesaurus sources for each candidate and mapping in the output.
- -I (--show_cuis)
- shows the UMLS CUI for each concept displayed.
- -j (--dump_aas)
- displays the Acronyms and Abbreviations discovered by MetaMap in the following form:
AA|PMID|Acronym|Expansion|#Acronym Tokens|#Acronym Chars|#ExpansionTokens|#Expansion Chars
- -J (--restrict_to_sts) <list>
- restricts output to those concepts with semantic types in the comma-separated <list> where spaces are not allowed.
- -k (--exclude_sts) <list>
- excludes concepts not having a semantic type in the comma-separated <list> where spaces are not allowed.
- -m (--hide_mappings)
- disables the display of mappings. As noted above, it is generally useful to display both the candidate list and the final mappings.
- -M (--mmi_output)
- displays in a separate section, the concepts from the highest-scoring mappings and their Semantic Types
- -n (--number_the_candidates)
- simply numbers the candidates in a displayed candidate list.
- -N --fielded_mmi_output
- displays in a separate section, a ranked list of all the mappings assigned to the text. Additional data such as the PMID of the citation, CUIs, abbreviated Semantic Types are also included.
- -O (--show_preferred_names_only)
- prevents MetaMap from displaying both the matching string as well as the preferred name when it displays concepts.
- -p (--hide_plain_syntax)
- disables the display of the words forming each phrase, as determined by the SPECIALIST parser.
- -q (--machine_output)
- causes output to take the form of Prolog terms rather than human-readable form. The --machine_output option affects all other output options. For further information on machine output, including visually enhanced examples, see the SKR Help page (http://skr.nlm.nih.gov/Help).
- -r (--threshold) <integer>
- restricts output to candidates whose evaluation score equals or exceeds the specified threshold. Judicious use of this option can prevent MetaMap from making errors in situations where some input text has no close matches in the Metathesaurus. An appropriate threshold can usually be determined simply by examining MetaMap output for typical text in a given application.
- -R (--restrict_to_sources) <list>
- restricts output to those sources in the comma-separated <list>; spaces are not allowed in the list.
- -s (--hide_semantic_types)
- disables the display of the semantic types of Metathesaurus concepts. By default, the semantic types of Metathesaurus concepts are displayed in square brackets for each concept in the candidate list and the mappings.
- -T (--tagger_output)
- displays the output of the MedPost/SKR tagger lining up input words on one line with their tags on a line below.
- -v (--variants)
- displays the variants generated for each input word.
- -W (--preferred_name_sources)
- lists all sources for the preferred names of displayed concepts. Note that this is just one of many possible choices for showing sources; showing all sources for any synonym in a concept, for example, would often produce very cluttered output.
- -x (--syntax)
- controls the output form of the results of the SPECIALIST parser. It outputs a Prolog term showing details of the syntactic processing.
- -X (--truncate_candidates_mappings)
- first truncates the list of candidates to the 100 top-scoring ones before computing mappings and then truncates the list of mappings to the 8 top-scoring ones. This option can sometimes prevent a combinatorial explosion caused by computing a large number of mappings from a large number of candidates as is often encountered when using --allow_overmatches.
- -XMLf
- Formatted XML, one XML document per input record/citation
- --XMLn
- Unformatted XML, one XML document per input record/citation
- --XMLf1
- Formatted XML, one XML document per input file
- --XMLn1
- Unformatted XML, one XML document per input file
- --negex
- outputs a list of negated umls concepts occurring in the input and the associated strings that caused the negation.
- --no_header_info
- suppresses printing of informational messages at the beginning of a MetaMap session.
- --phrases_only
- (for debugging purposes only)
- --warnings
- (for debugging purposes only)
Miscellaneous Options:
- --help
- displays MetaMap usage, i.e., the form of the command and a list of all options. This option has no short form, and must therefore be invoked as --help.