Lexical Tools

Example - Add SD-Rules Derived from factD

The original Lexical Tools collects 4,467 SD-pairs with 4,110 suffix SD-pairs. These SD-pairs can be used to derive possible SD-rules by following the same approach in the nomD session:

Identifies possible SD-Rules by stripping the same starting characters of each valid SD-pair generated from factD.

Select high frequency SD-rules to add to SD-rules set:

Possible SD-rule from factD	Root	Related	Notes
$\|noun\|less$\|adj\|131\|131	Yes	None	Selected
$\|verb\|$ion\|noun\|111\|111	Yes	Duplicated	Not selected
ist$\|noun\|y$\|noun\|63\|63	Yes	None	Selected
$\|adj\|ally$\|adv\|58\|58 => ic$\|adj\|ically$\|adv is used instead => need to verify the root stats	Yes	None	Selected
$\|noun\|ful$\|adj\|58\|58	Yes	None	Selected
c$\|adj\|s$\|noun\|54\|54 => ic$\|adj\|is$\|noun is used instead => need to verify the root stats	Yes	None	Selected
on$\|noun\|ve$\|adj\|38\|38	Yes	None	Not selected due to low frequency (coverage)
...	...	...	Not selected due to low frequency (coverage)

Apply the same procedures to get the optimized set as in add SD-rules from nomD session by using the optimized set of 2.3.4 as new baseline. This task involves:

Retrieve all raw SD-pairs from Lexicon (2013) of above four selected SD-rules
Tag raw SD-pairs
Get stats of SD-pairs of these four SD-rules
Add to SD-rules set and find the optimization
The total valid SD-Pair no. (TotalYes) needs to be calculated as total valid SD-pair no. from all parent-rules.

The iterative results are shown as follows:

ID	New Candidate Rule	Total Yes	Total Rule No.	Rule No.	A. Rate	Occr.	Yes	No	SD-Rule	Status	Source	Notes	Sys A. Rate	Sys C. Rate	Sys. Perf	Notes
2.3.4 (prev. optimized set)		39,197	90	68	60.66%	183	111	72	ar$\|adj\|e$\|noun	2013	ORG_RULE	SELF	95.05%	94.60%	1.8965	Baseline
2.3.4.1	`12\|99.95%\|1931\|1930\|1\|0\|ic$\|adj\|ically$\|adv\|2013\|ORG_FACT\|SELF`	41,127 = 39,197 + 1930	91	69	60.66%	183	111	72	ar$\|adj\|e$\|noun	2013	ORG_RULE	SELF	95.28%	94.85%	1.9013	Better
2.3.4.2	`15\|99.64%\|559\|557\|2\|0\|$\|noun\|less$\|adj\|2013\|ORG_FACT\|SELF`	41,684 = 41,127 + 557	92	70	60.66%	183	111	72	ar$\|adj\|e$\|noun	2013	ORG_RULE	SELF	95.34%	94.92%	1.9026	Better
2.3.4.3	`40\|95.63%\|504\|482\|22\|0\|ist$\|noun\|y$\|noun\|2013\|ORG_FACT\|SELF`	42,166 = 41,684 + 482	93	71	60.66%	183	111	72	ar$\|adj\|e$\|noun	2013	ORG_RULE	SELF	95.35%	94.98%	1.9032	Better
2.3.4.4	`49\|91.70%\|277\|254\|23\|0\|ic$\|adj\|is$\|noun\|2013\|ORG_FACT\|SELF`	42,420 = 42,166 + 254	94	72	60.66%	183	111	72	ar$\|adj\|e$\|noun	2013	ORG_RULE	SELF	95.32%	95.01%	1.9033	Better
2.3.4.5	`55\|89.93%\|139\|125\|14\|0\|$\|noun\|ful$\|adj\|2013\|ORG_FACT\|SELF`	42,545 = 42,420 + 125	95	73	60.66%	183	111	72	ar$\|adj\|e$\|noun	2013	ORG_RULE	SELF	95.30%	95.02%	1.9033	Best

From above results, all five selected SD-rules (with the highest frequency and precision from factD) improved the system performance. Thus, all these five SD-rles are added to the SD-rule set. Please note that SD-rule ic$|adj|ically$|adv and ic$|adj|is$|noun are suggested SD-rules from their root parent-rule $|adj|ally$|adv and c$|adj|s$|noun, respectively. Both of root parent-rules should be re-evaluated by this system.

The table above shows the iterative results by adding new rules derived from factD step by step. The results show all five selected SD-rules (with the highest frequency from factD) improve the system performance. Thus, all these five SD-rules are added to the SD-rule set to reach better coverage rate (95.02%) and system performance (1.9033) with accuracy rate of 95.30% to include 73 (out of 95) SD-rule in the optimized set. The diagram below shows the system accuracy and coverage curves of this optimized set.