Lexical Tools

SD-Rule Transaction Details: 2015 to 2016

The detail transaction of SD-Rules are described as below:

  • The following table shows the transcation on the 12 new propsoed SD-Rules in 2016.

    Computer Generated SD-Rules
    IDProposed New RuleSourceResultsRank & Rule 2015Rank & Rule 2016TypeCount ChangeAccu. Count
    01-CG1e$|verb|is$|nounnomDGood34: ose$|verb|osis$|noun27: se$|verb|sis$|noun Parent-1-Child+075*
    02-CG2sia$|noun|tic$|adjorgDGoodNone40: sia$|noun|tic$|adjNew in 2016+176
    03-CG3on$|noun|ve$|adjorgDGoodNone48: on$|noun|ve$|adjNew in 2016+177
    04-CG4e$|noun|ic$|adjorgDGoodNone49: e$|noun|ic$|adjNew in 2016+178
    05-CG5$|adj|ism$|nounnomDGoodNone51: $|adj|ism$|nounNew in 2016+179
    06-CG6ation$|noun|ed$|adjnomDGoodNone67: ation$|noun|ed$|adjNew in 2016+180
    07-CG7$|noun|ship$|nounorgDGoodNone70: $|noun|ship$|nounNew in 2016+181
    08-CG8e$|adj|ion$|nounnomDBadNone88: e$|adj|ion$|nounNew in 2016+081
    09-CG9$|noun|age$|nounorgDBadNone96: $|noun|age$|nounNew in 2016+081
    10-CG10e$|verb|ing$|nounnomDGood44: e$|verb|ing$|noun47: e$|verb|ion$|nounDuplicate+081
    Expert-Suggested SD-Rules
    11-ES1esis$|noun|ic$|adjExpertsGoodNone13: genesis$|noun|genic$|adjNew in 2016+182
    12-ES2al$|adj|ine$|nounExpertsBadNone98: al$|adj|ine$|nounNew in 2016+082

    * 75 out of 76 good SD-Rules in 2015 are evaluated as good rules in 2016. They could be identical, or replaced by the parent-rules or child-rules. Only the least rank (76) from the previous optimal set, ar$|adj|e$|noun, is evaluated as bad rule in 2016 release.

  • Good SD-Rules count in Optimal Set:
    • 2015 has 76 good rules while 2016 has 82 good rules in optimate set:
    • From the evaluation, 8 of 12 new rules are good (3 bad; 1 duplicated). Why is the total number of good SD-Rule only increased by 6 (from 76 to 82), not 84 (76 + 8)? It is because:
      • 1 of 2015 good rule is below the cutoff and become bad rule (-1).
      • 1 of good new rules is the parent-rule of 1 existing rules (+0).
      • 7 new rules have no parent-child relationshion with existing rule (+7)

      • So, tolal change is 7-1 = 6.

  • Good Rules comparison (2015-2016):
    Type20152016Details
    No Change7474...
    Good Rule turn bad10ar$|adj|e$|noun
    Parent-1-Child11
    20152016
    34: ose$|verb|osis$|noun27: se$|verb|sis$|noun
    New in 201607
    • 13: genesis$|noun|genic$|adj
    • 40: sia$|noun|tic$|adj
    • 48: on$|noun|ve$|adj
    • 49: e$|noun|ic$|adj
    • 51: $|adj|ism$|noun
    • 67: ation$|noun|ed$|adj
    • 70: $|noun|ship$|noun
    Total7682 

  • In our process, we only analyze parent-child hierachy for those SD-Rules has parent-child relationship co-exist in the collected set because it is very expensive (time comsuming) to evaluate all parent-child rules. Shoule we modify the processes as:
    • Normalize all SD-Rules to it's root-parent-rule.
    • Analyze parent-child-hieracy for all SD-Rules.

    in 2016, we spent ~ 2 weeks to evaluated 16 parents rules. If we modify to this process, there will be 101 parents rules, very expensive!!

The conclusion is the optimized set of SD-Rules is very steady as we expected. Does this imply that Lexicon is a good representative subset of general English?