Lexical Tools

SD-Rules Optimization Models and Procedures

  • Optimize a set of SD-rules
    Keep the good rules and remove the bad rules from a known set of rules:
    • The original Lexical Tools - 97 SD-Rules, is used as the baseline
    • Duplicated SD-Rules are removed (none is found in these 97-set)
    • Evaluated all parent-child rules and find the optimized set
      Go through each parent-rule to choose parent-rule or child-rules by following steps:
      • All child-rules are temporary removed (keep the parent-rule)
      • Decompose a parent-rule to child-rules, grandchild-rules, etc.
        Only decompose recursively on a child rule if it is a potential good rule:
        • its accuracy rate is higher than root parent-rule
        • its coverage rate is higher than 40% of root parent-rule. 40% is a default number and can be adjusted.
      • Evaluate child-rules for one parent-rule at a time
        Child-rules should be evaluated only if
        • Child-rules have higher accuracy rate than its parent-rule.
          Otherwise, just the ignore child-rule and use its parent-rule because parent-rule will have better accuracy and coverage rate than the child-rule.
        • Child-rules have more than 35% of coverage of its parent-rule.
          All rules have to be a good rule to be in the optimized set. So, child-rules should have good coverage. 35% is a default number and can be adjusted.
        • Compare the system performance between parent-rule and child-rules, and choose the better one:
          • higher system performance
          • more rules if system performance is the same
    • Find the optimized SD-Rule set with best system performance by superposition those better parent-child rules

  • Add a new rule to a set of SD-rules

    The same model and procedures as above can be used to evaluate new SD-rules. If a new SD-rule is suggested to add to a (optimized) set of SD-rules, the procedures are as follows:

    • Check if the new rules is a duplicated rule,
      if so, no need to evaluate the new rule.
    • Check if the new rule is a child-rule,
      if so, use the same parent-child evaluation procedures as above.
    • Check if the new rule is a parent-rule
      if so, use the same parent-child evaluation procedures as above.
    • Others:
      Evaluate it by comparing system performance

    The evaluation procedures need to get tagging stats of matching SD-pairs from Lexicon:

    • Get all raw SD-pairs matches the new SD-Rule from Lexicon
    • Tag the raw SD-pairs
    • These data can be derived from the existing set if the new rule is a child-rule.

The following figure shows the SD-Rules model and optimization procedures.