Lexical Tools

Case Study: 2024 SD-Rules Evaluating and Optimizing

This page shows details of step-by-step procedures to add/evaluate new SD-Rules and optimize SD-Rules set for 2024 Lexical Tools.

I. Description

The set of SD-Rules includes rules that have parent-child rlationship. For example, the two SD-rules below have parent-child relationship and both are in the SD-Rule set.

  • $|adj|ally$|adv|2015|ORG_FACT|PARENT
  • ic$|adj|ically$|adv|2013|ORG_FACT|CHILD

A optimized SD-Rule set should not include both parent and child rules at the same time. The objective of optimization is to choose the better performance from these parent and child rules. The total tagged SD-candidates are used for this evaluation. Precision, recall, and minimum occurance rate and coverage for decomposition (from the parent rules) are used to calcualte the performance and determine the best rules.The general process are:

  • to establish the baseline by removing CHILD rules from the set.a That is to keep SD-Rules that are PARENT or SELF.
  • go through all parent rules and their decomposed child rules to evaluate overall performance of the set of the set. The decomposition of child rules consider two factors:
    • occurance: the valid instance of a child rule must be above certain threshhold.
    • coverage: the coverage percentage of valid instances must be above certain percentage of valid instances of it's parent rules.

    The algorithm is summaried as follows:

    • All SD-Rule in the evaluated set is sorted by a descending order of:
      • precision (= relevant, retrieved No./retrieved No.), then
      • retrieved No. rate (if precision is the same)
    • The objective is to find the accumulated precision of SD-Rules that is above objective (95%)
    • The accumulated recall rate, F1 and total valid instance (relevant, retrieved No.) are used as referenced to monitor the performance of the optimized SD-Rule set.
    • The local occurance rate (40% of its parent) and local cover-recall (coverage) rate (25% of its parent) are used as creteria during the composition. That is if a child-rule has too less of occuranct instances or coverage of its parent is not decoposed as child-Rule (even it has high precision).
  • go through all new SD-Rules for evaluation
  • ignore (not decompose) exisitng SELF Rules
II. Procedures

III. Results

IV. Future Work