LuiNormalize
This is being provided for backward compatibility along with some feature enhancements. A better flow to use for normalization is the -f:N flow. This process involves abstracting away from case, uninflect words, and word order.
It also involves removing stop words, possessives, parenthetic plural forms,
stripping diacritics, splitting ligatures, mapping non_ASCII symbols to ASCII, replacing punctuation with spaces,
and strip and map non-ASCII Unicode characters from the input term.
Specifically, this normalization is more or less equivalent to the combined flow options (in this order as well) -f:q7:g:rs:o:t:l:B:C:q8:w
.
That is, normalize non-ASCII Unicode characters to ASCII, remove genitives, then remove parenthetic plural forms, then replace punctuations with spaces, then remove stop words, then lowercase, then uninflected words, then take each of the normalized uninflected words and map them to their canonical form,
then strip or map non-ASCII Unicode characters to ASCII,
and then word order sort.
Only one output record is generated for one input term.
No effect on the -m option flag. "none" is added at the end of the output.
This flow option is useful for making a compact yet retrieval-enhancing index on a set of terms. It is used to build the UMLS Metathesaurus normalized word and string indexes. It is therefore useful to transform one's queries using this flow option when retrieving from the Metathesaurus normalized word and string indexes.
Note. The normalized form is, in reality, a representation of a class of terms, rather than a word. As such, a numeric representation of that class is just as valid.
For example, "buckls" is not known by lexicon. Lexicon morphology rule is then used to find the uninflected terms, "buckl" and "buckls". "buckl" is chose as the base form since its value is smaller in terms of alphabetic order. A canonical form, "buckl", is found for "buckls".
For example, the citation forms of "block" are "bloc" and "block" when it is a noun and verb, respectively. "bloc" is chosen as the canonical form for the base form since its value is smaller in terms of alphabetic order.
shell> lvg -f:N3 fing fing|fing|2047|1|q7+g+rs+o+t+l+B+C+q8+w|1| finger finger|finger|2047|1|q7+g+rs+o+t+l+B+C+q8+w|1| fingers fingers|finger|2047|1|q7+g+rs+o+t+l+B+C+q8+w|1| colored colored|color|2047|1|q7+g+rs+o+t+l+B+C+q8+w|1| coloured coloured|color|2047|1|q7+g+rs+o+t+l+B+C+q8+w|1| zolasepam zolasepam|zolasepam|2047|1|q7+g+rs+o+t+l+B+C+q8+w|1| zolazepam zolazepam|zolasepam|2047|1|q7+g+rs+o+t+l+B+C+q8+w|1| megaoesophagus megaoesophagus|megaesophagus|2047|1|q7+g+rs+o+t+l+B+C+q8+w|1| megaesophagus megaesophagus|megaesophagus|2047|1|q7+g+rs+o+t+l+B+C+q8+w|1| lvg© 2008 lvg© 2008|2008 lvg|2047|1|q7+g+rs+o+t+l+B+C+q8+w|1| Dysenterie amibienne (aiguë) Dysenterie amibienne (aiguë)|aigue amibienne dysenterie|2047|1|q7+g+rs+o+t+l+B+C+q8+w|1| Burn(s);skin Burn(s);skin|burn skin|2047|1|q7+g+rs+o+t+l+B+C+q8+w|1|More examples