Lexical Tools

Unit Tests on Flow Components

This page describes sample test data for unit test of all implemented flow components.

0 Strip NEC and NOS
a Generate acronym expansions
A Generate acronym
An AntiNorm
b Uninflect a term
B Uninflect words in a term
Bn Normalize Uninflected words in a term
c Tokenize
ca Tokenize keep everything
ch Tokenize without breaking hyphens
C Canonicalize
Ct Retreive citation form
d Derivation
dc Derivation by categories
e Generate base spelling variants
E Retrieve EUI
f Filter, output only contain terms in Lexicon
fa Filter out acronyms and abbreviations
fp Filter out proper nouns
g Remove genitive
G Generate Fruitful variants
Ge Generate Fruitful variants, enhanced
Gn Generate known Fruitful variants
i Inflection
ici Inflection by categories and inflections
is Inflection with simple inflections
l Lowercase
L Retrieve categories and inflections
Ln Retrieve categories and inflections from Lexicon database
Lp Retrieve categories and inflections for terms begin with a given word
m Generate Metaphone form
n No operation
nom Nominalization
N Normalize text in a non-canonical form
N3 LuiNorm, normalize and canonicalize a term
o Replace punctuation with space
p Strip punctuation
P Strip punctuation enhanced
q Strip diacritics
q0 Map Symbols & Punctuation to ASCII
q1 Map Unicode to ASCII
q2 Split ligatures
q3 Get Unicode names
q4 Get Unicode base Synonym
q5 Normalize Unicode to ASCII
q6 Normalize Unicode to ASCII with Synonym Option
q7 Unicode Core Norm
q8 Strip or Map Unicode to ASCII
r Generate synonyms, recursively
rs Remove parenthetic plural forms
R Generate derivations, recursively
s Generate spelling variants
S Syntactic Uninvert
Si Map inflections into simple inflections
t Strip stop words
T Strip ambiguity tags
u Uninvert phrases around commas
U Convert output, form Xerox Parc stocastic tagger into Lvg style pipe delimited format
v Retrieve fruitful variants from database
w Sort words by order
ws Filter words by word size
y Synonym