PROJECTS

Natural Language Processing

MetaMapLite

Overview:

The primary goal of MetaMapLite to provide a near real-time named-entity recognizer which is not as rigorous as MetaMap but is much faster while allowing users to customize and augment its behavior for specific purposes.

MetaMapLite uses some of the tables originally developed for MetaMap. Currently, MetaMapLite does not support dynamic variant generation. Named Entities are found using longest match. Restriction by UMLS source and Semantic type is optional. Part-of-speech tagging which improves precision by a small amount (at the cost of speed) is also optional. Negation detection is available using either Wendy Chapman's context or a native negation detection algorithm based on Wendy Chapman's NegEx which is somewhat less effective, but faster.

Prerequisites:

Downloads

MetaMapLite 3.6.2rc8 and UMLS 2022

To use, extract the archive public_mm_lite_3.6.2rc8_binaryonly.zip and dataset archive public_mm_data_lite_usabase_2022aa.zip in the same directory:
$ unzip public_mm_lite_3.6.2rc8_binaryonly.zip
$ unzip public_mm_data_lite_usabase_2022aa.zip
	    
To use, extract the archive public_mm_lite_3.6.2rc8_binaryonly.zip and dataset archive public_mm_data_lite_base_2022aa.zip in the same directory:
$ unzip public_mm_lite_3.6.2rc8_binaryonly.zip
$ unzip public_mm_data_lite_base_2022aa.zip
	    

MetaMapLite 3.6.2rc6 and UMLS 2020

  • MetaMapLite 3.6.2rc6 binaryonly Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]
  • 2020AB UMLS Level 0+4+9 Dataset (WinZip - 1g), [sha1sum], [md5sum]
  • 2020AA UMLS Level 0+4+9 Dataset (WinZip - 1g), [sha1sum], [md5sum]

    Note: users who have downloaded the 2020AA USAbase data set distribution before May 15th: The 2020AA USAbase data set that was published on this website was missing the SNOMEDCT_US vocabulary. The affected archives have the following checksums:

    md5sum: aacca5e1e3a3791a5ecd8f4d91473cd2  public_mm_data_lite_usabase_2020aa.7z
    sha1sum: 675ec4545373b156a04712b3ca72fcdeab90fc6d  public_mm_data_lite_usabase_2020aa.7z
    md5sum: 000fac4b1be197f86386e4e5e1dabb49  public_mm_data_lite_usabase_2020aa.zip
    sha1sum: 1c0a16bdeb5560ce40d7a8be5333aeb0a8cfa2a5  public_mm_data_lite_usabase_2020aa.zip
    		
    The archives have been replaced with ones containing the SNOMEDCT_US vocabulary.

  • 2020AA UMLS Level 0 Dataset (WinZip - 877m), [sha1sum], [md5sum]
To use extract the archive public_mm_lite_3.6.2rc6_binaryonly.zip and dataset archive (public_mm_data_lite_base_2020aa.zip or public_mm_data_lite_usabase_2020aa.zip) in the same directory:
$ unzip public_mm_lite_3.6.2rc6_binaryonly.zip
$ unzip public_mm_data_lite_base_2020aa.zip
	    
Change to the 'public_mm_lite' directory and use the "--indexdir" option to specify the location of the dataset(shown using a relative path):
$ cd public_mm_lite
$ ./metamaplite.sh --indexdir=data/ivf/2020AA/Base file
	    
The path for the level 0+4+9 dataset is data/ivf/2020AA/USAbase.

MetaMapLite 3.6.2rc5 and 2020AA datasets

  • MetaMapLite 3.6.2rc5 binary only Version Contains MetaMapLite sources, jar files, and configuration, but no UMLS dataset. (WinZip - 250m), [sha1sum], [md5sum]
  • 2020AA UMLS Level 0+4+9 Dataset (WinZip - 1g), [sha1sum], [md5sum]

    Note: users who have downloaded the 2020AA USAbase data set distribution before May 15th: The 2020AA USAbase data set that was published on this website was missing the SNOMEDCT_US vocabulary. The affected archives have the following checksums:

    md5sum: aacca5e1e3a3791a5ecd8f4d91473cd2  public_mm_data_lite_usabase_2020aa.7z
    sha1sum: 675ec4545373b156a04712b3ca72fcdeab90fc6d  public_mm_data_lite_usabase_2020aa.7z
    md5sum: 000fac4b1be197f86386e4e5e1dabb49  public_mm_data_lite_usabase_2020aa.zip
    sha1sum: 1c0a16bdeb5560ce40d7a8be5333aeb0a8cfa2a5  public_mm_data_lite_usabase_2020aa.zip
    		
    The archives have been replaced with ones containing the SNOMEDCT_US vocabulary.

  • 2020AA UMLS Level 0 Dataset (WinZip - 877m), [sha1sum], [md5sum]
To use extract the archive public_mm_lite_3.6.2rc5_binaryonly.zip and dataset archive (public_mm_data_lite_base_2020aa.zip or public_mm_data_lite_usabase_2020aa.zip) in the same directory:
$ unzip public_mm_lite_3.6.2rc5_binaryonly.zip
$ unzip public_mm_data_lite_base_2020aa.zip
	    
Change to the 'public_mm_lite' directory and use the "--indexdir" option to specify the location of the dataset(shown using a relative path):
$ cd public_mm_lite
$ ./metamaplite.sh --indexdir=data/ivf/2020AA/Base file
	    
The path for the level 0+4+9 dataset is data/ivf/2020AA/USAbase.

MetaMapLite 3.6.2rc3 and 2018AB datasets

To use extract the archive public_mm_lite_3.6.2rc3_binaryonly.zip and dataset archive (public_mm_data_lite_base_2018ab_ascii.zip or public_mm_data_lite_usabase_2018ab_ascii.zip) in the same directory:
$ unzip public_mm_lite_3.6.2rc3_binaryonly.zip
$ unzip public_mm_data_lite_base_2018ab_ascii.zip
	    
Change to the 'public_mm_lite' directory and use the "--indexdir" option to specify the location of the dataset(shown using a relative path):
$ cd public_mm_lite
$ ./metamaplite.sh --indexdir=data/ivf/2018ABascii/Base file
	    
The path for the level 0+4+9 dataset is data/ivf/2018ABascii/USAbase.

MetaMapLite 3.6.2rc3

The 3.6.2rc2 version of MetaMapLite is a release candidate for version 3.6.2

  • Fixed error in tokenization when calling OpenNLP's Part-of-Speech tagger
  • Merged UTF-8 handling code from UTF branch into master

MetaMapLite 3.6.2rc2

The 3.6.2rc2 version of MetaMapLite is a release candidate for version 3.6.2 that fixes the following issues:

  • When using EntityLookup4 (i.e., setting metamaplite.enable.scoring = false), disabling postagging (i.e., setting metamaplite.enable.postagging = false) significantly reduces the number of entities found. On the same collection, I go from a median of 50 entities per document (with postagging = true) to a median of 0 entities per document (with postagging = false).
  • When using MetaMapLite, EntityLookup4 is initialized every time processDocumentList list is called and again each time processDocument is called, while EntityLookUp5 is only re-initialized when needed.
  • When using a non-standard data directory, the property: opennlp.en-pos.bin.path: $DATA_DIR/ models/en-pos-maxent.bin must be set. This property is not supplied in the template config file and MML falls back to using the hardcoded default value which results in a crash. It may be helpful to add this property to the generated config file so if a user is customizing their data directory they will know to adjust the properties accordingly.
  • When using a non-standard data directory, the following properties must be set for MMI file output or null pointer exceptions are thrown:
    • metamaplite.index.directory: $DATA_DIR/ivf/2017AA/Base/strict/indices/
    • metamaplite.ivf.meshtcrelaxedindex: $DATA_DIR/ivf/2017AA/Base/strict/indices/meshtcrelaxed
    These properties are not supplied in the template config file, and result in null pointer exceptions. I think that it might be helpful to add these properties to the generated config file.

MetaMapLite 3.6.1p1

The 3.6.1p1 version of MetaMapLite is a bugfix release that fixes the following issue:

  • Fixes an error where docid is not propagated to Entity records in output result.

MetaMapLite 3.6.1

The 3.6.1 version of MetaMapLite is a bugfix release that fixes the following issue:

  • Fixes an error in the method which removes entities which are subsumed by a larger entity in which some entities that were not subsumed were removed.

MetaMapLite 3.6

The 3.6 version of MetaMapLite is a bugfix release that fixes the following issues:

  • Fixes an error in the longest match algorithm in which entities which were subsumed by a longer enitity were not removed.
  • Includes an example of creating a result formatter.
  • Readme documentation has been updated.

MetaMapLite 3.5

The 3.5 version of MetaMapLite is a bugfix release that fixes the following issues:

  • The negation status of a concept was not refected in the MMI fielded output.
  • The location of chunker model file was not user modifiable.
  • The default properties file was missing a reference to the treecodes file used for MMI fielded output.
  • Readme documentation has been updated.

MetaMapLite 3.4

The 3.4 version of MetaMapLite now optionally adds scoring similar to the original MetaMap of concept mapping results for BRAT output and ranked indexing results for MMI Output using MetaMap's Ranked Indexing algorithm. MMI Results may be somewhat different from MetaMap's due to differences in MetaMapLite's mapping scores which are supplied as input to the MMI Ranked Indexing algorithm.

MetaMapLite 2016 3.1 SNAPSHOT

MetaMapLite 2016 3.0 SNAPSHOT

Example MetaMapLite Servlet

Documentation

MetaMapLite README Documentation

MetaMapLite Source Code

Publications

MetaMap Lite: an evaluation of a new Java implementation of MetaMap. Demner-Fushman D., Rogers WJ, Aronson AR. JAMIA. Volume 24, Issue 4, July 2017. DOI: 10.1093/jamia/ocw177. URL: https://academic.oup.com/jamia/issue/24/4. ALT URL: https://www.ncbi.nlm.nih.gov/pubmed/28130331.

Sources

The Source code for MetaMapLite is supplied with the distribution in the directory public_mm_lite/src. The source code is also available at the MetaMapLite Github Page.

Terms and Conditions

1. Informational Notice:

This software, "MetaMap and MetaMap Tools" was developed and funded by the National Library of Medicine, part of the National Institutes of Health, and agency of the United States Department of Health and Human Services, which is making the software available to the public for any commercial or non-commercial purpose under the following open-source BSD license.

NOTE: Users of the data distributed with MetaMap and MetaMap Tools are responsible for compliance with the UMLS Metathesaurus License Agreement which requires you to respect the copyrights of the constituent vocabularies and to file a brief annual report on your use of the UMLS. You also must have activated a UMLS Terminology Services (UTS) account.

2. LICENSE:

Government Usage Rights Notice: The U.S. Government retains unlimited, royalty-free usage rights to this software, but not ownership, as provided by Federal law. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

3. Use of MetaMap and MetaMap Tools

  • Redistributions of source code must retain this Informational Notice.

  • Redistributions in binary form must reproduce this Informational Notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

  • Neither the names of the National Library of Medicine, the National Institutes of Health, nor the names of any of the software developers may be used to endorse or promote products derived from this software without specific prior written permission.

  • The U.S. Government retains an unlimited, royalty-free right to use, distribute or modify the software.

  • Please acknowledge NLM as the source of the MetaMap software by including the phrase "Courtesy of the U.S. National Library of Medicine" or "Source: U.S. National Library of Medicine."
THIS SOFTWARE IS PROVIDED BY THE U.S. GOVERNMENT AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE U.S. GOVERNMENT OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.