Sub-Term Mapping Tools

STMT - Baseline Requirements

1. Tool Package:

IDDescriptionNotes
1.1downloadable packageDone
1.2installable toolsDone
1.3a stand-along tool packageDone
1.4provides command-line tools with functions described in the next section of tool functionsDone
1.4provides Java APIs with functions described in the next section of tool functionsDone

2. Tool Core Functions:

IDDescriptionNotes
2.1 Generic Tools
2.1.1A general purpose tool set provides functions to find:
  • sub-terms
  • prefix sub-terms
  • the longest prefix sub-term
  • sub-term patterns
  • all permutations of synonymous sub-term substitutions
in a specified corpus for a term
Done
2.1.2Configurable toolDone
2.2 Corpus
2.2.1Corpus is specified in a file and is able to be loaded in automaticallyDone
2.2.2Preloaded corpus file of LexiconDone
2.2.3Preloaded corpus file of UMLS-Core synonymsDone
2.3 Sub-term
2.3.1a sub-term is a term that is a subset of another term in the corpusDone
2.3.2Find all sub-terms in the corpus with starting and ending position index of the input termDone
2.3.3Find the longest prefix sub-term, which starts with the beginning of the input termDone
2.3.4Find all prefix sub-termsDone
2.4 Sub-term Patterns
2.4.1Find all sub-term patternsDone
2.4.2Find sub-term patterns with specified sub-term numberDone
2.5 Synonymous sub-term substitutions
2.5.1Find all permutations of all Synonymous sub-term substitutions on specified sub-term patterns Done
2.5.2The output is a list of string of above permuted patternsDone

3. Tool Other Functions:

IDDescriptionNotes
3.1 Normalization
3.1.1LexItem Norm: ignore case and punctuation (-f:g:rs:o:l) Done
3.1.2Synonym Norm: ignore case, punctuation, inflectional variants, and spelling variants (-f:g:rs:Ct:o:l) Done
3.1.3Lvg Norm: ignore non-ASCII Unicode, case, punctuation, inflectional variants, spelling variants, word order, etc (-f:q0:g:rs:o:t:l:B:Ct:q7:q8:w) Done
3.1.4Other customized NormDone
3.2 Synonym Definition
IDDescriptionNotes
3.2.1From UMLS-Core synonyms collection, includes (but not limited to) lexical synonyms, spelling variants, acronyms, abbreviations, British English, Greco-Latin, Device, etc.Done
3.2.2Assume all synonyms are the base (uninflected) formsDone
3.2.3Ignore caseDone
3.2.4Strip punctuationsDone
3.2.5Provide mapping between word to word and word to term (multi-words)Done
3.2.6Category (part of speech) is not usedDone
3.2.7All synonyms are symmetrical (if A is a synonym of B, the B must be a synonym of A)Done
3.2.8No recursive synonyms are used (if A is a synonym of B and B is synonym of C, then C is a recursive synonym of A)Done
3.3 Synonyms Source
3.3.1Provide a default synonym source (from UMLS-Core synonyms files)Done
3.3.2 Allow users to customize their own synonyms from a flat file system (append to the default synonym) Done
3.3.3Use # for commentsDone
3.3.4Ignore duplicationsDone
3.3.5Automatic generate symmetrical synonymsDone
3.3.6No category is used (2 fields only)Done
3.3.7Use pipe “|” to separate fieldsDone
3.3.8All synonyms' keys (not values) should be normalizedDone
3.3.10Configurable option to use the default synonyms or customized synonymsDone
3.4 Synonyms Mapping Functions
3.4.1Generates synonyms of normalized inputDone
3.4.2Generates recursive synonyms of normalized inputDone
3.4.3Generates recursive synonyms of normalized input for words and terms by specifying the recursive depthDone