Lexical Tools

SD-Rule Transaction Details: 2015 to 2016

The detail transaction of SD-Rules are described as below:

The following table shows the transcation on the 12 new propsoed SD-Rules in 2016.

Computer Generated SD-Rules
ID	Proposed New Rule	Source	Results	Rank & Rule 2015	Rank & Rule 2016	Type	Count Change	Accu. Count
01-CG1	`e$\|verb\|is$\|noun`	nomD	Good	34: `ose$\|verb\|osis$\|noun`	27: `se$\|verb\|sis$\|noun`	Parent-1-Child	+0	75*
02-CG2	`sia$\|noun\|tic$\|adj`	orgD	Good	None	40: `sia$\|noun\|tic$\|adj`	New in 2016	+1	76
03-CG3	`on$\|noun\|ve$\|adj`	orgD	Good	None	48: `on$\|noun\|ve$\|adj`	New in 2016	+1	77
04-CG4	`e$\|noun\|ic$\|adj`	orgD	Good	None	49: `e$\|noun\|ic$\|adj`	New in 2016	+1	78
05-CG5	`$\|adj\|ism$\|noun`	nomD	Good	None	51: `$\|adj\|ism$\|noun`	New in 2016	+1	79
06-CG6	`ation$\|noun\|ed$\|adj`	nomD	Good	None	67: `ation$\|noun\|ed$\|adj`	New in 2016	+1	80
07-CG7	`$\|noun\|ship$\|noun`	orgD	Good	None	70: `$\|noun\|ship$\|noun`	New in 2016	+1	81

08-CG8	`e$\|adj\|ion$\|noun`	nomD	Bad	None	88: `e$\|adj\|ion$\|noun`	New in 2016	+0	81
09-CG9	`$\|noun\|age$\|noun`	orgD	Bad	None	96: `$\|noun\|age$\|noun`	New in 2016	+0	81
10-CG10	`e$\|verb\|ing$\|noun`	nomD	Good	44: `e$\|verb\|ing$\|noun`	47: `e$\|verb\|ion$\|noun`	Duplicate	+0	81
Expert-Suggested SD-Rules
11-ES1	`esis$\|noun\|ic$\|adj`	Experts	Good	None	13: `genesis$\|noun\|genic$\|adj`	New in 2016	+1	82
12-ES2	`al$\|adj\|ine$\|noun`	Experts	Bad	None	98: `al$\|adj\|ine$\|noun`	New in 2016	+0	82

* 75 out of 76 good SD-Rules in 2015 are evaluated as good rules in 2016. They could be identical, or replaced by the parent-rules or child-rules. Only the least rank (76) from the previous optimal set, ar$|adj|e$|noun, is evaluated as bad rule in 2016 release.

Good SD-Rules count in Optimal Set:
- 2015 has 76 good rules while 2016 has 82 good rules in optimate set:
- From the evaluation, 8 of 12 new rules are good (3 bad; 1 duplicated). Why is the total number of good SD-Rule only increased by 6 (from 76 to 82), not 84 (76 + 8)? It is because:
  - 1 of 2015 good rule is below the cutoff and become bad rule (-1).
  - 1 of good new rules is the parent-rule of 1 existing rules (+0).
  - 7 new rules have no parent-child relationshion with existing rule (+7)
  - So, tolal change is 7-1 = 6.

Good Rules comparison (2015-2016):

Type 2015 2016 Details

No Change 74 74 ...

Good Rule turn bad 1 0 ar$|adj|e$|noun

Parent-1-Child

2015	2016
34: `ose$\|verb\|osis$\|noun`	27: `se$\|verb\|sis$\|noun`

New in 2016

13: genesis$|noun|genic$|adj
40: sia$|noun|tic$|adj
48: on$|noun|ve$|adj
49: e$|noun|ic$|adj
51: $|adj|ism$|noun
67: ation$|noun|ed$|adj
70: $|noun|ship$|noun

Total 76 82

In our process, we only analyze parent-child hierachy for those SD-Rules has parent-child relationship co-exist in the collected set because it is very expensive (time comsuming) to evaluate all parent-child rules. Shoule we modify the processes as:
- Normalize all SD-Rules to it's root-parent-rule.
- Analyze parent-child-hieracy for all SD-Rules.
in 2016, we spent ~ 2 weeks to evaluated 16 parents rules. If we modify to this process, there will be 101 parents rules, very expensive!!

The conclusion is the optimized set of SD-Rules is very steady as we expected. Does this imply that Lexicon is a good representative subset of general English?