Optimizing 2021 SD-Rule Set - Baseline
I. Get the stats (yes|no) from current year data
shell> cd ${SUFFIXD_DIR}/bin
shell> GetSuffixD ${YEAR}
11
ALL
II. Establish the baseline: remove all Child SD-Rules and use it as the baseline
shell> cd ${SUFFIXD_DIR}/data/${YEAR}/dateR/SdRulesOptimum/00.baseline
shell> cp -p ../../../data/sdRules.stats.rpt.pipe sdRules.stats.in.${YEAR}
shell> cp -p sdRules.stats.in.${YEAR} sdRules.stats.in.${YEAR}.removeChild
shell> fgrep "|CHILD" sdRules.stats.in.${YEAR}.removeChild | wc -l
shell> ln -sf ./sdRules.stats.in.${YEAR}.removeChild sdRules.stats.in
III. Get the Optimal Set
shell> cd ${SUFFIXD_DIR}/bin
shell> GetSdRule ${YEAR}
1
others
00.baseline
0
IV. Results
The result of this baseline set of SD-Rules includes 146 unique parents/self SD-Rules (no child rules). They are sorted by a descending order of precision (= relevant, retrieved No./retrieved No.) and then retrieved No. rate. The top 101 SD-Rules are used as the optimized SD-Rule set to cover 95.01% system (accumulated) precision and 93.39% system (accumulated) recall rate with a system performance (F1) of 1.8840. The total valid instance (relevant, retrieved) number is 54,347 (from the last column in ./sdRules.stats.out).
-- Total line no: 173 -- Total comment no: 27 -- Total Sd-Rule no: 146 --------------------------------------- -- Optimum SD-Rules: 101|67.01%|97|65|32|0|al$|noun|e$|verb|2020|NOM_D|SELF|95.01%|93.39%|1.8840|50756|53421