Local Optimization - Evaluate Parent rules and their Child rules
I. Find all candidate child rules for parent rules
shell> cd ${SUFFIXD_DIR}/data/${YEAR}/dataR/SdRulesCheck/
shell> mkdir decompose.40.25
(40: min. local occurrence rate, 25: min. local cover-recall rate)
shell> ln -sf ./decompose.40.25 decompose
shell> sort -u ../../../data/suffixD.yesNo.data > ./suffixD.yesNo.data.uSort
shell> flds 1,2,4,5,7 ./suffixD.yesNo.data.uSort > suffixD.yesNo.data.uSort.1.2.4.5.7
shell> ln -sf ./suffixD.yesNo.data.uSort.1.2.4.5.7 sdPairs.data
suffix-1|pos-1|suffix-2|pos-2
: remove the rest of the line
esis$|noun|ic$|adj
in if this rule is not there.
7
as described below to get all good candidate child-rule.
shell> cd ${SUFFIXD_DIR}/bin
shell> GetSdRule ${YEAR}
7
40 (min. occurrence rate - for decompose)
25 (35) (min. coverage rate - for candidate child)
Child rule must have higher accuracy rate (precision) than the root parent-rule and meets the min. coverage rate (recall, default is 25%). Manually look through the output file sdRule.decompose.out and search for "<= Candidate", these candidates are child-rules match following criteria:
shell>mv sdRules.decompose.out sdRules.decompose.out.${NO}.${RULE}
shell>mv sdRules.decompose.out sdRules.decompose.out.1.X-ally
1
II. Replace parent rules by selected candidate child SD-Rules for optimized set
shell>mkdir ${NO}.${RULE}
shell>mkdir 01.X-ally
shell>cd 01.X-ally
shell>cp ../00.baseline/sdRules.stats.in .
#31|99.08%|2075|2056|19|0|$|adj|ally$|adv|2015|ORG_FACT|PARENT 311|99.95%|1957|1956|1|0|c$|adj|cally$|adv|2021|DECOMPOSE|CHILD #312|99.95%|1952|1951|1|0|ic$|adj|ically$|adv|2021|DECOMPOSE|CHILD
shell> mv sdRules.stats.in sdRules.stats.in.01.1
shell> ln -sf ./sdRules.stats.in.01.1 sdRules.stats.in
shell> cd ${SUFFIXD_DIR}/bin
shell> GetSdRule ${YEAR}
1
others
01.X-ally
54347
<= total Yes from baseline
-- Optimum SD-Rules: 92|63.14%|331|209|122|0|$|noun|ist$|noun|2013|ORG_RULE|SELF|95.05%|94.26%|1.8931|50371|52993
mv Html file
shell> mv sdRules.stats.out.html sdRules.stats.out.01.1.html
shell> cp -p sdRules.stats.out.01.1.html ${WEB_LVG}/docs/designDoc/UDF/derivations/SD-Rules-Opti/Ex-${YEAR}/.
III. Results
Please refer to the result of optimization log for details of each step for these parent-child rules optimization processes.
The result of the final optimized set of SD-Rules includes 148 unique parents/self/child SD-Rules. They are sorted by a descending order of precision (= relevant, retrieved No./retrieved No.) and then retrieved No. rate. The top 104 SD-Rules are used as the optimized SD-Rule set to cover 95.00% system (accumulated) precision and 93.45% system (accumulated) recall rate with a system performance of 1.8857. The total valid instance number is 54347.
- Total line no: 197 -- Total comment no: 49 -- Total Sd-Rule no: 148 --------------------------------------- -- Optimum SD-Rules: 104|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.12%|93.45%|1.8857|50857|53465
IV. Post-Process
Generate SD-Rule trie from this 104/148 optimized set for Lexical tools SD-Rule generation.
cd ./dataR
cp ./35.ity-y/sdRules.stats.out ./35.ity-y/sdRules.stats.out.opti
ln -sf ./SdRulesOptimum/35.ity-y/sdRules.stats.out.opti sdRules.stats.out
cd ./bin
8
104
(the good rules)