SD-Rules Optimization - Introduction
After Lexical.2013 release, all possible SD-pairs that match known SD-Rules from Lexicon are reviewed and tagged either by computer programs automatically or by linguists manually. All valid SD-pairs are added to the derivation Facts and stored in database tables for derivational related flow components to retrieve derivations. Thus, SD-Rules are no longer needed for retrieving derivations that are known to Lexicon.
SD-Rules are now mainly used to retrieve derivations that are not in Lexicon. These SD-Rules are expected to keep the same high accuracy rate when they are applied to general English corpora other than Lexicon to have wider coverage under the assumption that the characteristics of these SD-Rules derived from Lexicon are consistent across entire English domain. Accordingly, a good set of SD-Rules shall results in better accuracy rate and coverage rate. Also, following filters are recommended to be integrated with SD-Rules to improve the accuracy rate:
depart|verb|department|noun
color|verb|colorment|noun
.
To use domain filter that are not Lexicon, all words of the working corpus (such as Medline) need to be found and added to the database.
mo|verb|moment|noun
la|verb|lament|noun
It is often to see some words meet more than one of above filter. Such as SD-pair of mo|verb|moment|noun
meets domain filter, word length filter, and stem length filter. Anyway, words should be filtered out if they meet one of above filters.