Lexical Tools

Application: MetaMap Norm

I. Objective

To find a term in a corpus (LEXICON) by approximate match with customized normalization.

This example shows how to use customized normalization to find a term in LEXICON (corpus) by approximate match (has same normalized term). MetaMap project applies this method to retrieve lexical items in LEXICON of the input. Three steps are involved:

  • MetaMapNorm (customized normalization):
    The normalization of MetaMap is to abstract away from punctuation and case. That is to remove genitive, remove parenthetic plural forms of (s), (es), (ies), replace punctuation with spaces, and lowercase (lvg -f:g:rs:o:l).

  • Generate index file:
    A preprocess that applies MetaMapNorm on all inflections of LEXICON and save in a file with the following format:

    keyvalues
    Normalized termInflectional terms

  • Get Lexical term from input:
    • Normalized the input term
    • Save index file in a hash table with key of normalized term and values of inflectional terms, respectively
    • Retrieve inflectional terms by matching normalized terms

II. Pre-Requirements
install lvg.${YEAR} package to "/Projects/LVG/lvg${YEAR}"

III. Source Code

  • MetaMapNorm.java
  • GenIndexFile.java
  • GetLexFormInput.java

IV. Compile

shell>cd ${MetaMapNorm}
shell>ant

V. Run & Results

shell> cd bin
shell> 3.GetLexFormInput 
----------- Program starts -----------
-- input: [2 compartment models]
-- Output: [2-compartment models]
----------- Program ends -----------

VI. Application Package Download

The whole package, MetaMapNorm.tgz can be down here.