Lexical Tools

Optimizing 2015 SD-Rule Set - Baseline

I. Get the stats (yes|no) from current year data

  • DIR: ${SUFFIXD_DIR}
  • Program:
    shell> cd ${SUFFIXD_DIR}/bin
    shell> GetSuffixD ${YEAR}
    11
  • Outputs:
    • sdRules.stats.rpt.* (sdRules.stats.rpt.pipe is used in this analysis)

II. Remove all Child SD-Rules and use this as the baseline

  • Create a new directory: ${SUFFIXD_DIR}/data/${YEAR}/dateR/SdRulesOptimum/1.baseline
  • shell> cd ${SUFFIXD_DIR}/data/${YEAR}/dateR/SdRulesOptimum/1.baseline
  • shell> cp -p ../../../data/sdRules.stats.rpt.pipe sdRules.stats.in.${YEAR}
  • shell> cp -p sdRules.stats.in.${YEAR} sdRules.stats.in.${YEAR}.removeChild
  • Comment out (#) all CHILD rules (19)
  • shell> ln -sf ./sdRules.stats.in.${YEAR}.removeChild sdRules.stats.in

III. Get the Optimal Set

  • Program:
    shell> cd ${SUFFIXD_DIR}/bin
    shell> GetSdRule ${YEAR}
    1
    others
    0.baseline
    0
  • Outputs directory:
    • ${SUFFIXD_DIR}/data/${YEAR}/dateR/SdRulesOptimum/0.baseline

IV. Results

The result of this baseline set of SD-Rules includes 101 unique parents/self SD-Rules (no child rules). They are sorted by a descending order of precision (= relevant, retrieved No./retrieved No.) and then retrieved No. rate. The top 76 SD-Rules are used as the optimized SD-Rule set to cover 95.19% system (accumulated) precision and 95.71% system (accumulated) recall rate with a system performance of 1.9090. The total valid instance number is 46950.

-- Total line no: 123
-- Total comment no: 22
-- Total Sd-Rule no: 101
---------------------------------------
-- Optimum SD-Rules: 76|61.70%|188|116|72|0|ar$|adj|e$|noun|2013|ORG_RULE|SELF|95.19%|95.71%|1.9090|44935|47207