Optimizing 2015 SD-Rule Set - Parent Rules
I. Find all candidate child rules for 14 parent rules
shell> sort -u ../../../data/suffixD.yesNo.data > ./suffixD.yesNo.data.uSort
shell> flds 1,2,4,5,7 ./suffixD.yesNo.data.uSort > suffixD.yesNo.data.uSort.1.2.4.5.7
shell> ln -sf ./suffixD.yesNo.data.uSort.1.2.4.5.7 sdPairs.data
shell> cd ${SUFFIXD_DIR}/bin
shell> GetSdRule ${YEAR}
7
40 (min. occurrence rate - for decompose)
25 (35) (min. coverage rate - for candidate child)
Child rule must have high accuracy rate (precision) than the root parent-rule and meets the min. coverage rate (recall). Manually look through the output file sdRule.decompose.out and search for "<= Candidate", these candidates are child-rules match following criteria:
shell>mv sdRules.decompose.out sdRules.decompose.out.no.rule
shell>mv sdRules.decompose.out sdRules.decompose.out.1.X-ally
II. Replace 14 parent rules by selected candidate child SD-Rules for optimized set
shell>mkdir 1.X-ally
shell>cd 1.X-ally
shell>cp ../0.baseline/sdRules.stats.in .
#24|99.08%|2072|2053|19|0|$|adj|ally$|adv|2015|ORG_FACT|PARENT 241|99.95%|1954|1953|1|0|c$|adj|cally$|adv|2015|DECOMPOSE|CHILD #242|99.95%|1949|1948|1|0|ic$|adj|ically$|adv|2015|DECOMPOSE|CHILD
shell> cd ${SUFFIXD_DIR}/bin
shell> GetSdRule ${YEAR}
1
others
1.X-ally
46950
<= from baseline
-- Optimum SD-Rules: 76|61.70%|188|116|72|0|ar$|adj|e$|noun|2013|ORG_RULE|SELF|95.21%|95.50%|1.9071|44835|47089
III. Results
Please refer to the result of optimization log for details of each step for these parent-child rules optimization processes.
The result of the final optimized set of SD-Rules includes 101 unique parents/self/child SD-Rules. They are sorted by a descending order of precision (= relevant, retrieved No./retrieved No.) and then retrieved No. rate. The top 76 SD-Rules are used as the optimized SD-Rule set to cover 95.22% system (accumulated) precision and 95.70% system (accumulated) recall rate with a system performance of 1.9093. The total valid instance number is 46950.
-- Total line no: 147 -- Total comment no: 46 -- Total Sd-Rule no: 101 --------------------------------------- -- Optimum SD-Rules: 76|61.70%|188|116|72|0|ar$|adj|e$|noun|2013|ORG_RULE|SELF|95.22%|95.70%|1.9093|44933|47187
IV. Post-Process
Update ${SUFFIXD_DIR}/data/${YEAR}/dataOrg/sdRules.data.${NEXT_YEAR} by: