SMT: Synonym Files
This page describes how to convert your own synonym file to a SMT standard normalized synonym file and then loaded into SMT corpus tree.
Variations | Standardization | Lvg flow | Notes and Examples |
---|---|---|---|
genitive | remove genitive | -f:g |
|
parenthetic plural from | remove (s), (es), (ies) | -f:rs | This is needed to standardize the term (not the synonym) |
spelling variants | get citation forms of all words in the term | -f:Ct | |
inflectional variants | |||
punctuation | replace with space | -f:o |
|
Upper, lower, mixed cases | lower case | -f:l | |
stopWord | Not implemented | -f:t |
Features | Descriptions | User's Input | SMT Standard Normalized |
---|---|---|---|
# for comments | A line starts with # is a comment | # This is a comment | |
Duplicates | All duplicated synonym pairs are removed |
|
|
Redundancy | Remove the synonym pair if the synonym terms are the same |
| |
Norm Redundancy | Remove the synonym pair if the normed synonym terms are the same. This is an optional because no new CUI will be found for such substitution |
| |
Bi-direction | Generate symmetric synonym pairs |
|
|
Multiple synonym pairs | Convert to all possible double pairs |
|
|
Recursive synonyms | Not implemented (manually add them) |
|
|
suffixes | Not implemented (manually remove them) |
|
Set the SYNONYM_FILE to the standard normalized synonyms file generated from above in the SMT configuration file (${STMT}/data/Config/stm.properties)