Example - Optimize Baseline
shell> GetSdRule 2013
> 1
> Test
> 0
Rule No. | A. Rate | Occr. | Yes | No | Tbd | SD-Rule | Status | Source | Notes | Sys A. Rate | Sys C. Rate | Sys. Perf |
---|---|---|---|---|---|---|---|---|---|---|---|---|
60 | 73.68% | 19 | 14 | 5 | 0 | a$|noun|iasis$|noun | 2013 | ORG_RULE | SELF | 95.76% | 91.84% | 1.8760 |
edit the sdRule.data by update the parent-rule
shell>GetSdRule 2013
7
40 (min. coverage rate for further decomposition)
35 (min. coverage rate for child candidate)
also, child must has high accuracy rate than the root parent-rule
Manually look through the output file sdRule.decompose.out and search for "<= Candidate", these candidates are child-rules match following criteria:
- the accuracy rate is higher than parent-rule
- the coverage rate is higher than 35% (or the specified number)
shell>GetSdRule 2013
1
Test
37136 (this is the total valid SD-Pair from parent only rule, baseline)
...
- Combine all cases with better system performance. A better system performance should have:
- More rules included (ID)
- High Sys. Perf (System performance)
The iterative results are shown in the following table
ID | Parent-Rule | Candidate Child-Rules | Rule No. | A. Rate | Occr. | Yes | No | Tbd | SD-Rule | Status | Source | Notes | Sys A. Rate | Sys C. Rate | Sys. Perf | Notes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Parent-rule only (Baseline) | No child-Rule | 60 | 73.68% | 19 | 14 | 5 | 0 | a$|noun|iasis$|noun | 2013 | ORG_RULE | SELF | 95.76% | 91.84% | 1.8760 | |
1.1 |
0|2063|2016|47|$|adj|ity$|noun|97.72%|100.00%
|
1|935|926|9|c$|adj|city$|noun|99.04%|45.32%
1|725|707|18|l$|adj|lity$|noun|97.52%|35.14%
| 61 | 73.68% | 19 | 14 | 5 | 0 | a$|noun|iasis$|noun | 2013 | ORG_RULE | SELF | 95.77% | 90.81% | 1.8658 | Worse |
1.2 |
0|2063|2016|47|$|adj|ity$|noun|97.72%|100.00%
|
2|934|926|8|ic$|adj|icity$|noun|99.14%|45.27%
1|725|707|18|l$|adj|lity$|noun|97.52%|35.14%
| 61 | 73.68% | 19 | 14 | 5 | 0 | a$|noun|iasis$|noun | 2013 | ORG_RULE | SELF | 95.77% | 90.81% | 1.8658 | Worse |
2.1 |
0|1311|954|357|$|noun|al$|adj|72.77%|100.00%
|
1|666|550|116|n$|noun|nal$|adj|82.58%|50.80%
| 64 | 62.65% | 332 | 208 | 124 | 0 | $|noun|ist$|noun | 2013 | ORG_RULE | SELF | 95.10% | 94.16% | 1.8926 | Better |
2.2 |
0|1311|954|357|$|noun|al$|adj|72.77%|100.00%
|
2|614|526|88|on$|noun|onal$|adj|85.67%|46.83%
| 64 | 62.65% | 332 | 208 | 124 | 0 | $|noun|ist$|noun | 2013 | ORG_RULE | SELF | 95.17% | 94.09% | 1.8926 | Better |
2.3 |
0|1311|954|357|$|noun|al$|adj|72.77%|100.00%
|
3|571|491|80|ion$|noun|ional$|adj|85.99%|43.55%
| 65 | 60.66% | 183 | 111 | 72 | 0 | ar$|adj|e$|noun | 2013 | ORG_RULE | SELF | 95.01% | 94.30% | 1.8931 | Better (Best) |
2.4 |
0|1311|954|357|$|noun|al$|adj|72.77%|100.00%
|
4|466|402|64|tion$|noun|tional$|adj|86.27%|35.55%
| 65 | 60.66% | 183 | 111 | 72 | 0 | ar$|adj|e$|noun | 2013 | ORG_RULE | SELF | 95.04% | 94.06% | 1.8910 | Better |
3.1 |
0|263|227|36|a$|noun|an$|adj|86.31%|100.00%
| No candidate child-rule found | Same | |||||||||||||
4.1 |
0|273|1|272|a$|noun|an$|noun|0.37%|100.00%
|
1|136|1|135|ia$|noun|ian$|noun|0.74%|49.82%
| 60 | 73.68% | 19 | 14 | 5 | 0 | a$|noun|iasis$|noun | 2013 | ORG_RULE | SELF | 95.76% | 91.84% | 1.8760 | Same |
5.1 |
0|138|120|18|a$|noun|ar$|adj|86.96%|100.00%
|
1|115|105|10|la$|noun|lar$|adj|91.30%|83.33%
| 60 | 73.68% | 19 | 14 | 5 | 0 | a$|noun|iasis$|noun | 2013 | ORG_RULE | SELF | 95.78% | 91.80% | 1.8758 | Worse |
5.2 |
0|138|120|18|a$|noun|ar$|adj|86.96%|100.00%
|
2|69|65|4|ula$|noun|ular$|adj|94.20%|50.00%
| 60 | 73.68% | 19 | 14 | 5 | 0 | a$|noun|iasis$|noun | 2013 | ORG_RULE | SELF | 95.79% | 91.70% | 1.8748 | Worse |
6.1 |
0|306|303|3|ance$|noun|ant$|adj|99.02%|100.00%
| No candidate child-rule found | Same | |||||||||||||
7.1 |
0|2429|2410|19|ation$|noun|e$|verb|99.22%|100.00%
|
1|1007|1006|1|sation$|noun|se$|verb|99.90%|41.46%
1|1222|1222|0|zation$|noun|ze$|verb|100.00%|50.31%
| 61 | 73.68% | 19 | 14 | 5 | 0 | a$|noun|iasis$|noun | 2013 | ORG_RULE | SELF | 95.78% | 91.35% | 1.8714 | Worse |
7.2 |
0|2429|2410|19|ation$|noun|e$|verb|99.22%|100.00%
|
2|985|985|0|isation$|noun|ise$|verb|100.00%|40.55%
1|1222|1222|0|zation$|noun|ze$|verb|100.00%|50.31%
| 61 | 73.68% | 19 | 14 | 5 | 0 | a$|noun|iasis$|noun | 2013 | ORG_RULE | SELF | 95.78% | 91.30% | 1.8708 | Worse |
8.1 |
0|280|275|5|ency$|noun|ent$|adj|98.21%|100.00%
| No candidate child-rule found | Same | |||||||||||||
9.1 |
0|1004|1000|4|sis$|noun|tic$|adj|99.60%|100.00%
|
1|329|327|2|esis$|noun|etic$|adj|99.39%|32.77%
1|366|366|0|osis$|noun|otic$|adj|100.00%|36.45%
1|214|214|0|ysis$|noun|ytic$|adj|100.00%|21.31%
| 62 | 73.68% | 19 | 14 | 5 | 0 | a$|noun|iasis$|noun | 2013 | ORG_RULE | SELF | 95.75% | 91.59% | 1.8735 | Worse |
From above table, only case-2 provides better results when replacing parent-rule by child-rules. In addition, the case 2.3 has the best system performance of 1.8931 (with system accuracy rate of 95.01% and system coverage rate of 94.30%) and thus is used as our optimized set of SD-Rules for the original SD-rules. The diagram below shows the system accuracy rate vs. system coverage rate. The optimized cutoff point is right around the intersection of these two curves to include 65 (out of 87) SD-rules.