Lexical Tools

SD-Rule Transaction Details: 2014 to 2015

The detail transaction of SD-Rules are described as below:

Baseline of candidate SD-Rule count:
- 2014 baseline collects 107 SD-Rules.
- 2015 baseline collects 120 SD-Rules, by adding 15 new SD-Rules from 107 collected Sd-Rules in 2014. Two of them are duplciates because they are child-rules (120 = 107 + 15 - 2)
- The baseline set is processed to removed duplicates of parent-child relationship. In 2015, 19 child-rules from 120 baseline SD-Rules set are remove to have 101 unique SD-Rules, (120 - 19 = 101).

Good SD-Rules count in Optimal Set:

2014 has 73 good rules while 2015 has 76 food rules in optimate set:
All 73 good SD-Rules in 2014 are good rules in 2015. They could be identical, or replaced by the parent-rules or child-rules.

From the evaluation, 11 of 15 new rules are good. Why is the total number of good SD-Rule only increased by 3 (from 73 to 76), not 84 (73 + 11)? It is because:

4 of new rules are parent-rules of 4 existing rules (+0).
2 of new rules are parent-rules of 4 exsiting rules (-2).
5 new rules have no parent-child relationshion with existing rule (+5)
So, tolal change is 5-2 = 3.

This involved complicated child-parent rules situation, please see SD-Rule rank mapping for details. They are summarized as detail below:

Type 2014 2015 Details

No Change 65 65 ...

Parent-1-Child

2014	2015
02: `ability$\|noun\|able$\|adj`	09: `ility$\|noun\|le$\|adj`
08: `ic$\|adj\|ically$\|adv`	15: `$\|adj\|ally$\|adv`
21: `ency$\|noun\|ent$\|adj`	19: `cy$\|noun\|t$\|adj`
55: `ion$\|noun\|ional$\|adj`	70: `$\|noun\|al$\|adj`

Parent-2-Child

2014	2015
16: `ance$\|noun\|ant$\|adj` 18: `ence$\|noun\|ent$\|adj`	18: `nce$\|noun\|nt$\|adj`
10: `ate$\|verb\|ation$\|noun` 63: `se$\|verb\|sion$\|noun`	20: `e$\|verb\|ion$\|noun`

New in 2015

02: se$|verb|zation$|noun
03: sation$|noun|ze$|verb
45: e$|verb|ing$|noun
61: al$|adj|us$|noun
67: es$|noun|ic$|adj

Total 73 76

The following table shows the transcation on the 15 new propsoed SD-Rules in 2015.

Computer Generated SD-Rules
ID	Proposed New Rule	Source	Results	Rank & Rule 2015	Rank & Rule 2014	Type	Count Change	Accu. Count
01-CG1	`se$\|verb\|zation$\|noun`	nomD	Good	02: `se$\|verb\|zation$\|noun`	None	New in 2015	+1	74
02-CG2	`sation$\|noun\|ze$\|verb`	nomD	Good	03: `sation$\|noun\|ze$\|verb`	None	New in 2015	+1	75
03-CG3	`ility$\|noun\|le$\|adj`	nomD	Good	09: `ility$\|noun\|le$\|adj`	02: `ability$\|noun\|able$\|adj`	Parent-1-Child	+0	75
04-CG4	`$\|adj\|ally$\|adv`	orgD	Good	15: `$\|adj\|ally$\|adv`	08: `ic$\|adj\|ically$\|adv`	Parent-1-Child	+0	75
05-CG5	`nce$\|noun\|nt$\|adj`	nomD	Good	18: `nce$\|noun\|nt$\|adj`	16: `ance$\|noun\|ant$\|adj` 18: `ence$\|noun\|ent$\|adj`	Parent-2-child	-1	74
06-CG6	`cy$\|noun\|t$\|adj`	nomD	Good	19: `cy$\|noun\|t$\|adj`	21: `ency$\|noun\|ent$\|adj`	Parent-1-Child	+0	74
07-CG7	`e$\|verb\|ion$\|noun`	nomD	Good	20: `e$\|verb\|ion$\|noun`	10: `ate$\|verb\|ation$\|noun` 63: `se$\|verb\|sion$\|noun`	Parent-2-Child	-1	73
08-CG8	`c$\|adj\|s$\|noun`	orgD	Good	43: `ic$\|adj\|is$\|noun`	41: `ic$\|adj\|is$\|noun`	Child	+0	73
Expert-Suggested SD-Rules
09-ES1	`e$\|verb\|ing$\|noun`	Experts	Good	45: `e$\|verb\|ing$\|noun`	None	New in 2015	+1	74
10-ES2	`al$\|adj\|us$\|noun`	Experts	Good	61: `al$\|adj\|us$\|noun`	None	New in 2015	+1	75
11-ES3	`es$\|noun\|ic$\|adj`	Experts	Good	67: `es$\|noun\|ic$\|adj`	None	New in 2015	+1	76

12-ES4	`$\|noun\|ize$\|verb`	Experts	Bad	78: `$\|noun\|ize$\|verb`	None	New	+0	76
13-ES5	`es$\|noun\|ic$\|noun`	Experts	Bad	101: `es$\|noun\|ic$\|noun`	None	New	+0	76
14-ES6	`ian$\|adj\|ia$\|noun`	Experts	Good	57: `a$\|noun\|an$\|adj`	53: `a$\|noun\|an$\|adj`	Duplicated-Child	+0	76
15-ES7	`ian$\|noun\|ia$\|noun`	Experts	Bad	99: `a$\|noun\|an$\|noun`	93: `a$\|noun\|an$\|noun`	Duplicated-Child	+0	76

In the evaluation process, we removed two proposed new rules (ES-6 and ES-7) because they are child rules of existing rules. After the normalization (alphabetic order and use root-parent-rule), they are duplicated rules. Thus, we did not anlyze the parent-child hierachy on these two rules. Should we analyze them in the future releses?
In our process, we only analyze parent-child hierachy for those SD-Rules has parent-child relationship co-exist in the collected set because it is very expensive. Shoule we modify the processes as:
- Normalize all SD-Rules to it's root-parent-rule.
- Analyze parent-child-hieracy for all SD-Rules.
in 2015, we have 14 parents rules. If we modify to this process, there will be 101 parents rules, very expensive!!
2015 has 10 more root parent rules.

The conclusion is the optimized set of SD-Rules is very steady as we expected. Does this imply that Lexicon is a good representative subset of general English?