STMT

Sub-Term Mapping Tools

STMT - Baseline Requirements

1. Tool Package:

ID	Description	Notes
1.1	downloadable package	Done
1.2	installable tools	Done
1.3	a stand-along tool package	Done
1.4	provides command-line tools with functions described in the next section of tool functions	Done
1.4	provides Java APIs with functions described in the next section of tool functions	Done

2. Tool Core Functions:

ID	Description	Notes
2.1 Generic Tools
2.1.1	A general purpose tool set provides functions to find: sub-terms prefix sub-terms the longest prefix sub-term sub-term patterns all permutations of synonymous sub-term substitutions in a specified corpus for a term	Done
2.1.2	Configurable tool	Done
2.2 Corpus
2.2.1	Corpus is specified in a file and is able to be loaded in automatically	Done
2.2.2	Preloaded corpus file of Lexicon	Done
2.2.3	Preloaded corpus file of UMLS-Core synonyms	Done
2.3 Sub-term
2.3.1	a sub-term is a term that is a subset of another term in the corpus	Done
2.3.2	Find all sub-terms in the corpus with starting and ending position index of the input term	Done
2.3.3	Find the longest prefix sub-term, which starts with the beginning of the input term	Done
2.3.4	Find all prefix sub-terms	Done
2.4 Sub-term Patterns
2.4.1	Find all sub-term patterns	Done
2.4.2	Find sub-term patterns with specified sub-term number	Done
2.5 Synonymous sub-term substitutions
2.5.1	Find all permutations of all Synonymous sub-term substitutions on specified sub-term patterns	Done
2.5.2	The output is a list of string of above permuted patterns	Done

3. Tool Other Functions:

ID	Description	Notes
3.1 Normalization
3.1.1	LexItem Norm: ignore case and punctuation (-f:g:rs:o:l)	Done
3.1.2	Synonym Norm: ignore case, punctuation, inflectional variants, and spelling variants (-f:g:rs:Ct:o:l)	Done
3.1.3	Lvg Norm: ignore non-ASCII Unicode, case, punctuation, inflectional variants, spelling variants, word order, etc (-f:q0:g:rs:o:t:l:B:Ct:q7:q8:w)	Done
3.1.4	Other customized Norm	Done
3.2 Synonym Definition
ID	Description	Notes
3.2.1	From UMLS-Core synonyms collection, includes (but not limited to) lexical synonyms, spelling variants, acronyms, abbreviations, British English, Greco-Latin, Device, etc.	Done
3.2.2	Assume all synonyms are the base (uninflected) forms	Done
3.2.3	Ignore case	Done
3.2.4	Strip punctuations	Done
3.2.5	Provide mapping between word to word and word to term (multi-words)	Done
3.2.6	Category (part of speech) is not used	Done
3.2.7	All synonyms are symmetrical (if A is a synonym of B, the B must be a synonym of A)	Done
3.2.8	No recursive synonyms are used (if A is a synonym of B and B is synonym of C, then C is a recursive synonym of A)	Done
3.3 Synonyms Source
3.3.1	Provide a default synonym source (from UMLS-Core synonyms files)	Done
3.3.2	Allow users to customize their own synonyms from a flat file system (append to the default synonym)	Done
3.3.3	Use # for comments	Done
3.3.4	Ignore duplications	Done
3.3.5	Automatic generate symmetrical synonyms	Done
3.3.6	No category is used (2 fields only)	Done
3.3.7	Use pipe “\|” to separate fields	Done
3.3.8	All synonyms' keys (not values) should be normalized	Done
3.3.10	Configurable option to use the default synonyms or customized synonyms	Done
3.4 Synonyms Mapping Functions
3.4.1	Generates synonyms of normalized input	Done
3.4.2	Generates recursive synonyms of normalized input	Done
3.4.3	Generates recursive synonyms of normalized input for words and terms by specifying the recursive depth	Done