SD-Rules Optimization Models and Procedures
- Optimize a set of SD-rules
Keep the good rules and remove the bad rules from a known set of rules:
- The original Lexical Tools - 97 SD-Rules, is used as the baseline
- Duplicated SD-Rules are removed (none is found in these 97-set)
- Evaluated all parent-child rules and find the optimized set
Go through each parent-rule to choose parent-rule or child-rules by following steps:
- All child-rules are temporary removed (keep the parent-rule)
- Decompose a parent-rule to child-rules, grandchild-rules, etc.
Only decompose recursively on a child rule if it is a potential good rule:
- its accuracy rate is higher than root parent-rule
- its coverage rate is higher than 40% of root parent-rule. 40% is a default number and can be adjusted.
- Evaluate child-rules for one parent-rule at a time
Child-rules should be evaluated only if
- Child-rules have higher accuracy rate than its parent-rule.
Otherwise, just the ignore child-rule and use its parent-rule because parent-rule will have better accuracy and coverage rate than the child-rule.
- Child-rules have more than 35% of coverage of its parent-rule.
All rules have to be a good rule to be in the optimized set. So, child-rules should have good coverage. 35% is a default number and can be adjusted.
- Compare the system performance between parent-rule and child-rules, and choose the better one:
- higher system performance
- more rules if system performance is the same
- Find the optimized SD-Rule set with best system performance by superposition those better parent-child rules
- Add a new rule to a set of SD-rules
The same model and procedures as above can be used to evaluate new SD-rules. If a new SD-rule is suggested to add to a (optimized) set of SD-rules, the procedures are as follows:
- Check if the new rules is a duplicated rule,
if so, no need to evaluate the new rule.
- Check if the new rule is a child-rule,
if so, use the same parent-child evaluation procedures as above.
- Check if the new rule is a parent-rule
if so, use the same parent-child evaluation procedures as above.
- Others:
Evaluate it by comparing system performance
The evaluation procedures need to get tagging stats of matching SD-pairs from Lexicon:
- Get all raw SD-pairs matches the new SD-Rule from Lexicon
- Tag the raw SD-pairs
- These data can be derived from the existing set if the new rule is a child-rule.
The following figure shows the SD-Rules model and optimization procedures.