Archives files necessary for using an additional data set

For each supported UMLS release, MetaMap provides three data subsets (Base, USAbase, and NLM), each subset is derived from a subset of the UMLS Metathesaurus; these subsets are explained in Section 3 of MetaMap 2011 Release Notes ( Each of these three data subsets are available in a both strict and relaxed models see Note on Model Documentation at the end of this page.

To use any model for a particular release you need two archive files: The base archive for the release and the archive file for the particular model you want to use . If you wish to use more than one model you only need to download the base archive file once.

If you wish to use the relaxed model for Base data set 2012AA it is necessary to download the base archive file: public_mm_data_base_2012aa_base.bz2 and the relaxed model archive: public_mm_data_base_2012aa_relaxed.bz2. You will also need the file public_mm_data_dblexicon_2012.tar.bz2 that provides the 2012 version of the SPECIALIST Lexicon required for the 2012aa data set.

Installing the data sets for additional models

After downloading the necessary archive files for additional data set, first move to the directory containing the directory of the existing public_mm installation and then extract the additional data set using bzip2 and tar.

For example, to add the 2012AA Base relaxed model to your MetaMap installation:

$ cd <directory containing existing public_mm installation>
$ tar xvfj public_mm_data_dblexicon_2012.tar.bz2
$ tar xvfj public_mm_data_base_2012aa_base.tar.bz2
$ tar xvfj public_mm_data_base_2012aa_relaxed.tar.bz2

The 2012aa data-files should be installed now, to use the new data set run MetaMap with the options -Z <4 digit year and 2 letter release> -V <UMLS subset> --<model name>_model in the current case the options -Z 2012AA -V Base --relaxed_model will suffice:

$ cd public_mm
$ echo "lung cancer" | ./bin/metamap13 -Z 2012AA -V Base --relaxed_model

The output should be similar to this:

metamap13.BINARY.Linux (2013)

Control options:
Processing 00000000.tx.1: lung cancer

Phrase: "lung cancer"
Meta Candidates (Total=10; Excluded=3; Pruned=0; Remaining=7)
  1000   Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process]
  1000   LUNG CANCER (Carcinoma of lung) [Neoplastic Process]
   861   Cancer (Malignant Neoplasms) [Neoplastic Process]
   861   Lung [Body Part, Organ, or Organ Component]
   861   Lung (Lung diseases) [Disease or Syndrome]
   861   Cancer (Cancer Genus) [Eukaryote]
   861   Cancer (Specialty Type - cancer) [Biomedical Occupation or Discipline]
   805 E Pulmonary (Pulmonary (qualifier value)) [Qualitative Concept]
   768 E Pneumonia [Disease or Syndrome]
   768 E Pulmonary Arteries (Pulmonary artery structure) [Body Part, Organ, or Organ Component]
Meta Mapping (1000):
  1000   LUNG CANCER (Carcinoma of lung) [Neoplastic Process]
Meta Mapping (1000):
  1000   Lung Cancer (Malignant neoplasm of lung) [Neoplastic Process]

Archive Filename Organization

The archive files are named using the following nomenclature:

public_mm_data_{UMLS subset}_{UMLS year}{UMLS release}_{model}.tar.bz2

On Windows:

public_mm_data_win32_{UMLS subset}_{UMLS year}{UMLS release}_{model}.7z

public_mm_data_win32_{UMLS subset}_{UMLS year}{UMLS release}_{model}.zip

A description of the segments of the archive name follows:

UMLS subset
Subset of UMLS Meta-Thesaurus (Base, USAbase, and NLM)
UMLS year
The year the UMLS Meta-Thesaurus release was created.
UMLS release
AA - first release, AB - second release

Below are the MetaMap UMLS subsets and the corresponding UMLS Metathesaurus subsets used to generate them:

Level 0 vocabularies only.
Level 0 vocabularies + SNOMEDCT
   All vocabularies except (CDT, CPT, and other HCPCS vocabularies)

See MetamorphoSys Help for additional information:

Note: Model Documentation

For a description of the content of each model, see the paper: "Filtering the UMLS Metathesaurus for MetaMap" at the SKR website under "Research Information" (