Lexical Tools

Comparison on Optimized Set on 2014 and 2015

I. From 2014 to 2015:

The 2014 optimized set is based on 2013 SD-Rule data. It is used as baseline for 2015. 15 new SD-Rules are then added to the 2014 SD-Rule set for evaluation and used for 2015 release. 11 of them are evaluated as good rules in the optimized set, 2 are bad rules and 2 are duplicated (child rule of existing rules). Also, in the optimized set, 2 child rules are used to replace proposed rules.

SD-RulePrecisionInstancesSourceResults
Good Rules
se$|verb|zation$|noun100.00%1108NOM_DGood SD-Rule
sation$|noun|ze$|verb100.00%1071NOM_DGood SD-Rule
ility$|noun|le$|adj99.94%1625NOM_DGood SD-Rule
$|adj|ally$|adv99.08%2072ORG_DGood SD-Rule
ce$|noun|t$|adj98.82%847NOM_DChild rule nce$|noun|nt$|adj is used
cy$|noun|t$|adj98.77%406NOM_DGood SD-Rule
e$|verb|ion$|noun98.76%2336NOM_DGood SD-Rule
c$|adj|s$|noun91.46%281ORG_DChild rule ic$|adj|is$|noun is used
e$|verb|ing$|noun91.43%210SuggestionsGood SD-Rule
ian$|adj|ia$|noun86.31%263SuggestionsDuplicated, parent rule an$|adj|a$|noun is used
al$|adj|us$|noun84.35%262SuggestionsGood SD-Rule
es$|noun|ic$|adj73.91%23SuggestionsGood SD-Rule
Bad Rules
$|noun|ize$|verb59.05%442SuggestionsBad SD-Rule
ian$|noun|ia$|noun0.36%274SuggestionsDuplicated, parent rule an$|noun|a$|noun is a bad SD-Rule
es$|noun|ic$|noun0.00%19SuggestionsBad SD-Rule

II. Comparison of SD-Rule set:

Item20142015
Total Unique Rules96101
Total Good Rules7376
Opti. System Precision95.30%95.22%
Opti. System Recall95.01%95.70%
Opti. System Performance1.90311.9093
Cufoff Rulear$|adj|e$|noun ar$|adj|e$|noun
Optimized Set 2014 Optimized Set 2015 Optimized Set
Optimized Diagram

For the Optimial set:

  • The optimized set is similar between 14 and 15, please see SD-Rule rank mapping for details.
  • All good rules in 14 are in 15.
  • 2014 optimal set has 96 SD-Rules, 73 of them are good.
  • 2015 optimal set has 101 SD-Rules, 76 of them are good.

III. Transaction Details:

The detail transaction of SD-Rules are described as below:

  • Baseline SD-Rule count:

  • Good SD-Rules count in Optimal Set:
    • 2014 has 73 good rules while 2015 has 76 food rules in optimate set:
    • All 73 good SD-Rules in 2014 are good rules in 2015. They could be identical, or replaced by the parent-rules or child-rules.
    • From the evaluation, 11 of 15 new rules are good. Why is the total number of good SD-Rule only increased by 3 (from 73 to 76), not 84 (73 + 11)? It is because of the complicated child-parent rules situation get involved, please see SD-Rule rank mapping for details. They are summarized as detail below:

      Type20142015Details
      No Change6565...
      Parent-1-Child44
      20142015
      02: ability$|noun|able$|adj09: ility$|noun|le$|adj
      08: ic$|adj|ically$|adv15: $|adj|ally$|adv
      21: ency$|noun|ent$|adj19: cy$|noun|t$|adj
      55: ion$|noun|ional$|adj70: $|noun|al$|adj
      Parent-2-Child42
      20142015
      16: ance$|noun|ant$|adj
      18: ence$|noun|ent$|adj
      18: nce$|noun|nt$|adj
      10: ate$|verb|ation$|noun
      63: se$|verb|sion$|noun
      20: e$|verb|ion$|noun
      New in 201505
      • 02: se$|verb|zation$|noun
      • 03: sation$|noun|ze$|verb
      • 45: e$|verb|ing$|noun
      • 61: al$|adj|us$|noun
      • 67: es$|noun|ic$|adj
      Total7376 

    • The following table shows the transcation on the 15 new propsoed SD-Rules in 2015.

      Computer Generated SD-Rules
      IDProposed New RuleSourceResultsRank & Rule 2015Rank & Rule 2014TypeCount ChangeAccu. Count
      01-CG1se$|verb|zation$|nounnomDGood02: se$|verb|zation$|nounNoneNew in 2015+174
      02-CG2sation$|noun|ze$|verbnomDGood03: sation$|noun|ze$|verbNoneNew in 2015+175
      03-CG3ility$|noun|le$|adjnomDGood09: ility$|noun|le$|adj02: ability$|noun|able$|adjParent-1-Child+075
      04-CG4$|adj|ally$|advorgDGood15: $|adj|ally$|adv08: ic$|adj|ically$|advParent-1-Child+075
      05-CG5nce$|noun|nt$|adjnomDGood18: nce$|noun|nt$|adj 16: ance$|noun|ant$|adj
      18: ence$|noun|ent$|adj
      Parent-2-child-174
      06-CG6cy$|noun|t$|adjnomDGood19: cy$|noun|t$|adj21: ency$|noun|ent$|adjParent-1-Child+074
      07-CG7e$|verb|ion$|nounnomDGood20: e$|verb|ion$|noun 10: ate$|verb|ation$|noun
      63: se$|verb|sion$|noun
      Parent-2-Child-173
      08-CG8c$|adj|s$|nounorgDGood43: ic$|adj|is$|noun41: ic$|adj|is$|nounChild+073
      Expert-Suggested SD-Rules
      09-ES1e$|verb|ing$|nounExpertsGood45: e$|verb|ing$|nounNoneNew in 2015+174
      10-ES2al$|adj|us$|nounExpertsGood61: al$|adj|us$|nounNoneNew in 2015+175
      11-ES3es$|noun|ic$|adjExpertsGood67: es$|noun|ic$|adjNoneNew in 2015+176
      12-ES4$|noun|ize$|verbExpertsBad78: $|noun|ize$|verbNoneNew+076
      13-ES5es$|noun|ic$|nounExpertsBad101: es$|noun|ic$|nounNoneNew+076
      14-ES6ian$|adj|ia$|nounExpertsGood57: a$|noun|an$|adj53: a$|noun|an$|adjDuplicated-Child+076
      15-ES7ian$|noun|ia$|nounExpertsBad99: a$|noun|an$|noun93: a$|noun|an$|nounDuplicated-Child+076

    • In the evaluation process, we removed two proposed new rules (ES-6 and ES-7) because they are child rules of existing rules. After the normalization (alphabetic order and use root-parent-rule), they are duplicated rules. Thus, we did not anlyze the parent-child hierachy on these two rules. Should we analyze them in the future releses?
    • In our process, we only analyze parent-child hierachy for those SD-Rules has parent-child relationship co-exist in the collected set because it is very expensive. Shoule we modify the processes as:
      • Normalize all SD-Rules to it's root-parent-rule.
      • Analyze parent-child-hieracy for all SD-Rules.

      in 2015, we have 14 parents rules. If we modify to this process, there will be 101 parents rules, very expensive!!
    • 2015 has 10 more root parent rules.

The conclusion is the optimized set of SD-Rules is very steady as we expected. Does this imply that Lexicon is a good representative subset of general English?