Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
STMT Tutorial
STMT is used to find all sub-term related functions for NLP projects. This page describes the functionality by going through an example. Please refer to design documents for how the algorithm works.
The CORPUS_FILE is ignored if the SYNONYM_FILE is specified. The key (1st field) of the SYNONYM_FILE will be used as terms in corpus.
In addition, user is able to create their own normalization in the Java StmtApi class public abstract Vector
Norm key | Synonym |
---|---|
dog | canine |
dog | puppy |
canine | K9 |
cat | feline |
feline | kitty |
dog and cat | pets |
The following examples illustrate basic functions of subterms:
Input: Dog and cat g and
Functions | Results |
---|---|
In Corpus | true |
The Longest Prefix | dog and cat |
Prefixes |
|
Subterms |
|
Subterm Synonym Substitutions |
|
Please note that prefix related functions require one-to-one normalization, such as LexItemNorm to work properly.
The subterm synonym substitution is the most complicated operation in STMT. It includes five steps as described below (using above example).
Step | Results |
---|---|
normTerm | dog and cat |
subterms |
|
subterm patterns |
|
synonym patterns |
|
synonym substitution permutations |
|