Lexical Tools

SD-Rules Optimization - Introduction

After Lexical.2013 release, all possible SD-pairs that match known SD-Rules from Lexicon are reviewed and tagged either by computer programs automatically or by linguists manually. All valid SD-pairs are added to the derivation Facts and stored in database tables for derivational related flow components to retrieve derivations. Thus, SD-Rules are no longer needed for retrieving derivations that are known to Lexicon.

SD-Rules are now mainly used to retrieve derivations that are not in Lexicon. These SD-Rules are expected to keep the same high accuracy rate when they are applied to general English corpora other than Lexicon to have wider coverage under the assumption that the characteristics of these SD-Rules derived from Lexicon are consistent across entire English domain. Accordingly, a good set of SD-Rules shall results in better accuracy rate and coverage rate. Also, following filters are recommended to be integrated with SD-Rules to improve the accuracy rate:

  • Exception filter:
    SD-pairs that are tagged as invalid should be added to exceptions and filtered out. Such as: depart|verb|department|noun
  • Domain filter:
    If the generated words are not in the working domain (corpus), such as Lexicon, they are not real words and should be filtered out. Such as "colorment" in the SD-Pair: color|verb|colorment|noun. To use domain filter that are not Lexicon, all words of the working corpus (such as Medline) need to be found and added to the database.
  • Min. word length filter:
    The term should be filtered out if the length of a term is too short (default is 3). Such as the length of "mo" is 2 and should be filtered out. The SD-pair is: mo|verb|moment|noun
  • Min. stem length filter:
    The word should be filtered out if the stem length (the word length minus the suffix length) is too small (default is 3). Such as the stem length of "lament" is 2 (= 6-4) and should be filtered out. The SD-pair is: la|verb|lament|noun

It is often to see some words meet more than one of above filter. Such as SD-pair of mo|verb|moment|noun meets domain filter, word length filter, and stem length filter. Anyway, words should be filtered out if they meet one of above filters.