Lexical Tools

Optimizing 2024 SD-Rule Set - Baseline

I. Get the stats (yes|no) from current year data

  • DIR: ${SUFFIXD_DIR}
  • Program:
    shell> cd ${SUFFIXD_DIR}/bin
    shell> GetSuffixD ${YEAR}
    11
    ALL
  • Outputs:
    • sdRules.stats.rpt.* (sdRules.stats.rpt.pipe is used in this analysis)

II. Establish the baseline: remove all Child SD-Rules and use it as the baseline

  • Create a new directory: ${SUFFIXD_DIR}/data/${YEAR}/dateR/SdRulesOptimum/00.baseline
  • shell> cd ${SUFFIXD_DIR}/data/${YEAR}/dateR/SdRulesOptimum/00.baseline
  • shell> cp -p ../../../data/sdRules.stats.rpt.pipe sdRules.stats.in.${YEAR}
  • shell> cp -p sdRules.stats.in.${YEAR} sdRules.stats.in.${YEAR}.removeChild
  • Manually comment out (#) all CHILD rules (24)
    shell> fgrep "|CHILD" sdRules.stats.in.${YEAR}.removeChild | wc -l
  • shell> ln -sf ./sdRules.stats.in.${YEAR}.removeChild sdRules.stats.in

III. Get the Optimal Set

  • Algorithm:
    • Let program to select optimal set automatically. which is to cover min. 95% of precision for all root parents rules.
    • The F1 may not be the best, but, 95% precision is our objective.

    • Then, we evaluate child rules for these parent rules to find the best child rules for optimal set.

  • Program:
    shell> cd ${SUFFIXD_DIR}/bin
    shell> GetSdRule ${YEAR}
    1
    others
    00.baseline
    0
  • Outputs:
    • ${SUFFIXD_DIR}/data/${YEAR}/dateR/SdRulesOptimum/00.baseline/sdRules.stats.out.*

IV. Results

The result of this baseline set of SD-Rules includes 162 unique PARENT/SELF SD-Rules (no CHILD rules). They are sorted by a descending order of precision (= relevant, retrieved No./retrieved No.) and then retrieved No. rate. The top 102 SD-Rules are used as the optimized SD-Rule set to cover 95.26% system (accumulated) precision and 87.24% system (accumulated) recall rate with a system performance (F1) of 1.8251. The total valid instance (relevant, retrieved) number is 59,911 (from the last column in ./sdRules.stats.out).

-- Total line no: 189
-- Total comment no: 27
-- Total Sd-Rule no: 162
---------------------------------------
-- Optimum SD-Rules: 102|73.13%|67|49|18|0|$|verb|per$|noun|2024|WORDNET|SELF|95.26%|87.24%|1.8251|52268|54867