Lexical Tools

Application: Normalization

I. Objective

To use Lexical Tools Norm APIs to normalize input term.

In NLP applications, we often want to normalize words/terms before indexing and query in database. Users may define and compose their own normalization according to the requirements (please see MetaMap Norm). Lexical tools provide a very thorough normalization, which involves abstracting away from case, inflection, word order, removing stop words, possessives, replacing punctuation with spaces, removing parenthetic plural forms of (s), (es), (ies), (S), (ES), and (IES), and non-ASCII Unicode to ASCII normalization from the input term. This example illustrates how to use Norm APIs in the applications.

II. Pre-Requirements
install lvg.${YEAR} package to "/Projects/LVG/lvg${YEAR}"

III. Source Code

import java.util.*;
import gov.nih.nlm.nls.lvg.Api.*;

public class Normalization
{
    // test driver
    public static void main(String[] args)
    {
        // instantiate a LvgApi object by config file
        String lvgConfigFile
            = "/export/home/lu/Projects/LVG/lvg2012/data/config/lvg.properties";
        NormApi normApi = new NormApi(lvgConfigFile);

        // Process the inflectional variants mutation
        String in = "left"; // use lexItem as input to lvgApi
        try
        {
            Vector outs = normApi.Mutate(in);

            // PrintOut the Result
            for(String out: outs)
            {
                System.out.println(in + "|" + out);
            }

            // clean up
            normApi.CleanUp();
        }
        catch (Exception e)
        {
            System.err.println("** ERR: " + e.toString());
        }
    }
}

IV. Compile

shell>javac -classpath ../lib/lvg2012dist.jar Normalization.java

V. Run & Results

shell>java -classpath ./:../lib/lvg2012dist.jar:/Projects/LVG/lvg2012/ Normalization

left|left
left|leave

=> Input term, left, can be normalized to left or leave.

VI. Application Package Download

The whole package, Normalization.tgz can be down here.