Lexical Tools

SD-Rule Transaction Details: 2014 to 2015

The detail transaction of SD-Rules are described as below:

  • Baseline of candidate SD-Rule count:
    • 2014 baseline collects 107 SD-Rules.
    • 2015 baseline collects 120 SD-Rules, by adding 15 new SD-Rules from 107 collected Sd-Rules in 2014. Two of them are duplciates because they are child-rules (120 = 107 + 15 - 2)
    • The baseline set is processed to removed duplicates of parent-child relationship. In 2015, 19 child-rules from 120 baseline SD-Rules set are remove to have 101 unique SD-Rules, (120 - 19 = 101).

  • Good SD-Rules count in Optimal Set:
    • 2014 has 73 good rules while 2015 has 76 food rules in optimate set:
    • All 73 good SD-Rules in 2014 are good rules in 2015. They could be identical, or replaced by the parent-rules or child-rules.
    • From the evaluation, 11 of 15 new rules are good. Why is the total number of good SD-Rule only increased by 3 (from 73 to 76), not 84 (73 + 11)? It is because:
      • 4 of new rules are parent-rules of 4 existing rules (+0).
      • 2 of new rules are parent-rules of 4 exsiting rules (-2).
      • 5 new rules have no parent-child relationshion with existing rule (+5)

      • So, tolal change is 5-2 = 3.
      This involved complicated child-parent rules situation, please see SD-Rule rank mapping for details. They are summarized as detail below:

      Type20142015Details
      No Change6565...
      Parent-1-Child44
      20142015
      02: ability$|noun|able$|adj09: ility$|noun|le$|adj
      08: ic$|adj|ically$|adv15: $|adj|ally$|adv
      21: ency$|noun|ent$|adj19: cy$|noun|t$|adj
      55: ion$|noun|ional$|adj70: $|noun|al$|adj
      Parent-2-Child42
      20142015
      16: ance$|noun|ant$|adj
      18: ence$|noun|ent$|adj
      18: nce$|noun|nt$|adj
      10: ate$|verb|ation$|noun
      63: se$|verb|sion$|noun
      20: e$|verb|ion$|noun
      New in 201505
      • 02: se$|verb|zation$|noun
      • 03: sation$|noun|ze$|verb
      • 45: e$|verb|ing$|noun
      • 61: al$|adj|us$|noun
      • 67: es$|noun|ic$|adj
      Total7376 

    • The following table shows the transcation on the 15 new propsoed SD-Rules in 2015.

      Computer Generated SD-Rules
      IDProposed New RuleSourceResultsRank & Rule 2015Rank & Rule 2014TypeCount ChangeAccu. Count
      01-CG1se$|verb|zation$|nounnomDGood02: se$|verb|zation$|nounNoneNew in 2015+174
      02-CG2sation$|noun|ze$|verbnomDGood03: sation$|noun|ze$|verbNoneNew in 2015+175
      03-CG3ility$|noun|le$|adjnomDGood09: ility$|noun|le$|adj02: ability$|noun|able$|adjParent-1-Child+075
      04-CG4$|adj|ally$|advorgDGood15: $|adj|ally$|adv08: ic$|adj|ically$|advParent-1-Child+075
      05-CG5nce$|noun|nt$|adjnomDGood18: nce$|noun|nt$|adj 16: ance$|noun|ant$|adj
      18: ence$|noun|ent$|adj
      Parent-2-child-174
      06-CG6cy$|noun|t$|adjnomDGood19: cy$|noun|t$|adj21: ency$|noun|ent$|adjParent-1-Child+074
      07-CG7e$|verb|ion$|nounnomDGood20: e$|verb|ion$|noun 10: ate$|verb|ation$|noun
      63: se$|verb|sion$|noun
      Parent-2-Child-173
      08-CG8c$|adj|s$|nounorgDGood43: ic$|adj|is$|noun41: ic$|adj|is$|nounChild+073
      Expert-Suggested SD-Rules
      09-ES1e$|verb|ing$|nounExpertsGood45: e$|verb|ing$|nounNoneNew in 2015+174
      10-ES2al$|adj|us$|nounExpertsGood61: al$|adj|us$|nounNoneNew in 2015+175
      11-ES3es$|noun|ic$|adjExpertsGood67: es$|noun|ic$|adjNoneNew in 2015+176
      12-ES4$|noun|ize$|verbExpertsBad78: $|noun|ize$|verbNoneNew+076
      13-ES5es$|noun|ic$|nounExpertsBad101: es$|noun|ic$|nounNoneNew+076
      14-ES6ian$|adj|ia$|nounExpertsGood57: a$|noun|an$|adj53: a$|noun|an$|adjDuplicated-Child+076
      15-ES7ian$|noun|ia$|nounExpertsBad99: a$|noun|an$|noun93: a$|noun|an$|nounDuplicated-Child+076

    • In the evaluation process, we removed two proposed new rules (ES-6 and ES-7) because they are child rules of existing rules. After the normalization (alphabetic order and use root-parent-rule), they are duplicated rules. Thus, we did not anlyze the parent-child hierachy on these two rules. Should we analyze them in the future releses?
    • In our process, we only analyze parent-child hierachy for those SD-Rules has parent-child relationship co-exist in the collected set because it is very expensive. Shoule we modify the processes as:
      • Normalize all SD-Rules to it's root-parent-rule.
      • Analyze parent-child-hieracy for all SD-Rules.

      in 2015, we have 14 parents rules. If we modify to this process, there will be 101 parents rules, very expensive!!
    • 2015 has 10 more root parent rules.

The conclusion is the optimized set of SD-Rules is very steady as we expected. Does this imply that Lexicon is a good representative subset of general English?