STMT - Baseline Requirements
1. Tool Package:
ID | Description | Notes |
---|---|---|
1.1 | downloadable package | Done |
1.2 | installable tools | Done |
1.3 | a stand-along tool package | Done |
1.4 | provides command-line tools with functions described in the next section of tool functions | Done |
1.4 | provides Java APIs with functions described in the next section of tool functions | Done |
2. Tool Core Functions:
ID | Description | Notes |
---|---|---|
2.1 Generic Tools | ||
2.1.1 | A general purpose tool set provides functions to find:
| Done |
2.1.2 | Configurable tool | Done |
2.2 Corpus | ||
2.2.1 | Corpus is specified in a file and is able to be loaded in automatically | Done |
2.2.2 | Preloaded corpus file of Lexicon | Done |
2.2.3 | Preloaded corpus file of UMLS-Core synonyms | Done |
2.3 Sub-term | ||
2.3.1 | a sub-term is a term that is a subset of another term in the corpus | Done |
2.3.2 | Find all sub-terms in the corpus with starting and ending position index of the input term | Done |
2.3.3 | Find the longest prefix sub-term, which starts with the beginning of the input term | Done |
2.3.4 | Find all prefix sub-terms | Done |
2.4 Sub-term Patterns | ||
2.4.1 | Find all sub-term patterns | Done |
2.4.2 | Find sub-term patterns with specified sub-term number | Done |
2.5 Synonymous sub-term substitutions | ||
2.5.1 | Find all permutations of all Synonymous sub-term substitutions on specified sub-term patterns | Done |
2.5.2 | The output is a list of string of above permuted patterns | Done |
3. Tool Other Functions:
ID | Description | Notes |
---|---|---|
3.1 Normalization | ||
3.1.1 | LexItem Norm: ignore case and punctuation (-f:g:rs:o:l) | Done |
3.1.2 | Synonym Norm: ignore case, punctuation, inflectional variants, and spelling variants (-f:g:rs:Ct:o:l) | Done |
3.1.3 | Lvg Norm: ignore non-ASCII Unicode, case, punctuation, inflectional variants, spelling variants, word order, etc (-f:q0:g:rs:o:t:l:B:Ct:q7:q8:w) | Done |
3.1.4 | Other customized Norm | Done |
3.2 Synonym Definition | ||
ID | Description | Notes |
3.2.1 | From UMLS-Core synonyms collection, includes (but not limited to) lexical synonyms, spelling variants, acronyms, abbreviations, British English, Greco-Latin, Device, etc. | Done |
3.2.2 | Assume all synonyms are the base (uninflected) forms | Done |
3.2.3 | Ignore case | Done |
3.2.4 | Strip punctuations | Done |
3.2.5 | Provide mapping between word to word and word to term (multi-words) | Done |
3.2.6 | Category (part of speech) is not used | Done |
3.2.7 | All synonyms are symmetrical (if A is a synonym of B, the B must be a synonym of A) | Done |
3.2.8 | No recursive synonyms are used (if A is a synonym of B and B is synonym of C, then C is a recursive synonym of A) | Done |
3.3 Synonyms Source | ||
3.3.1 | Provide a default synonym source (from UMLS-Core synonyms files) | Done |
3.3.2 | Allow users to customize their own synonyms from a flat file system (append to the default synonym) | Done |
3.3.3 | Use # for comments | Done |
3.3.4 | Ignore duplications | Done |
3.3.5 | Automatic generate symmetrical synonyms | Done |
3.3.6 | No category is used (2 fields only) | Done |
3.3.7 | Use pipe “|” to separate fields | Done |
3.3.8 | All synonyms' keys (not values) should be normalized | Done |
3.3.10 | Configurable option to use the default synonyms or customized synonyms | Done |
3.4 Synonyms Mapping Functions | ||
3.4.1 | Generates synonyms of normalized input | Done |
3.4.2 | Generates recursive synonyms of normalized input | Done |
3.4.3 | Generates recursive synonyms of normalized input for words and terms by specifying the recursive depth | Done |