Lexical Tools

LVG Transformations (Flow Components)

In lvg, individual transformations are represented by flow components which are collected into flows. A flow may have one or more components. Each flow is serial application of each of its components; the output of each component in the flow is the input to the next component. Lvg allows single flows or multiple parallel flows.

In the new Java version of LVG, a new implementation design is used for transformation, LexItem, Category, Inflection, etc.. The implemented flow components are shown as in the table below. The Java Class Usage of flow components is also provided.

Flag in C	Flag in Java	Feature Description
0	0	Strip NEC and NOS
a	a	Return known acronym expansions
A	A	Return known acronyms
	An	Return possible mapping terms (approximate match) in Lexicon
b	b	Uninflect a term
B	B	Uninflect words in a term
	Bn	Normalized uninflect words in a term
c	c	Tokenize a term into "words"
c:a	ca	Tokenize, keep everything
c:h	ch	Tokenize without breaking hyphens
C	C	Canonicalize
	Ct	Retrieve the lexical name (base=, BAS) form
d	d	Generate derivational variants
d:N	dc~LONG	Generate derivational variants with specifying output categories
e	e	Generate known uninflected from spelling variants
E	E	Retrieve the unique EUI for a term
f	f	Filter output to contain only forms from lexicon
f:a	fa	Filter out acronyms and abbreviations from the output
f:p	fp	Filter out proper nouns from the output
g	g	Remove genitive
G	G	Generate all fruitful variants
	Ge	Generate fruitful variants, enhanced
	Gn	Generate known fruitful variants
i	i	Generate inflectional variants
i:N:N	ici~LONG+LONG	Generate inflectional variants with specifying output categories and output inflections
	is	Generate inflectional variants with simple inflections
l	l	Lowercase
L	L	Retrieve category and inflection for a term
L:n	Ln	Retrieve category and inflection from lexicon
L:p	Lp	Retrieve category and inflection for all terms that begin with the given word
m	m	Generate the Metaphone spelling normalized form
n	n	No operation
	nom	Retrieve nominalizations
N N:2	N	Normalize the input text in a non-canonical way (Norm)
N:3	N3	LuiNorm (canonical way normalization)
o	o	Replace punctuation with spaces
p	p	Strip punctuation
P	P	Strip punctuation, enhanced
q	q	Strip diacritics
	q0	Map Symbols & Punctuation to ASCII
	q1	Map Unicode to ASCII
	q2	Split Ligatures
	q3	Get Unicode names
	q4	Get Unicode base synonym
	q5	Normalize Unicode to ASCII
	q6	Normalize Unicode to ASCII with synonym Option
	q7	Unicode Core Norm
	q8	Strip or Map Unicode to ASCII
r	r	Generate synonyms, recursively
	rs	Remove plural patterns of (s), (es), and (ies)
R	R	Generate derivational variants, recursively
s	s	Generate known spelling variants
S	S	Syntactic uninvert
	Si	Map inflections into simple inflections
t	t	Strip stop words
T	T	Strip ambiguity tags
u	u	Uninvert the input phrase around commas
U	U	Convert the output of the Xerox Parc stochastic tagger into lvg style pipe delimited format
v	v	Retrieve fruitful variants from precomputed data
w	w	Sort words by order
wsN	ws~INT	Filter words by specified word size
y	y	Generate synonyms
z	z	Generate antonyms
zs	zs	Antonym substitutions