Lexical Tools

Example - Optimize Baseline

  • Find the system performance of the normalized set (baseline without child-rule, 87 SD-Rule)
    shell> GetSdRule 2013
    > 1
    > Test
    => Make sure manually remove all child-rules from 97-Rule and put to sdRules.stats.in
    > 0
    • The total valid SD-Pair instance count is 37,136
    • The cutoff SD-Rule for the optimized set is:
      Rule No.A. RateOccr.YesNoTbdSD-RuleStatusSourceNotesSys A. RateSys C. RateSys. Perf
      6073.68%191450a$|noun|iasis$|noun2013ORG_RULESELF95.76%91.84%1.8760
  • Go through each parent-rule to replace parent-rule by child-rules and compare the results:
    • Decompose parent-rule
      • Decompose the parent-rule and find SD-pairs stats
        edit the sdRule.data by update the parent-rule
        shell>GetSdRule 2013
        7
        40 (min. coverage rate for further decomposition)
        35 (min. coverage rate for child candidate)
        also, child must has high accuracy rate than the root parent-rule
        Manually look through the output file sdRule.decompose.out and search for "<= Candidate", these candidates are child-rules match following criteria:
        • the accuracy rate is higher than parent-rule
        • the coverage rate is higher than 35% (or the specified number)
      • The decomposed candidate child-rules (shown in the 3rd column in the table below) are used to replace parent-rule for evaluating optimized SD-rule set.
    • Evaluate child-rules
      • Manually replace parent-rule with found candidate child-rules and find the system performance
        shell>GetSdRule 2013
        1
        Test
        37136 (this is the total valid SD-Pair from parent only rule, baseline)
        ...
      • Combine all cases with better system performance. A better system performance should have:
        • More rules included (ID)
        • High Sys. Perf (System performance)

    The iterative results are shown in the following table

    IDParent-RuleCandidate Child-RulesRule No.A. RateOccr.YesNoTbdSD-RuleStatusSourceNotesSys A. RateSys C. RateSys. PerfNotes
    0 Parent-rule only (Baseline)No child-Rule6073.68%191450a$|noun|iasis$|noun2013ORG_RULESELF95.76%91.84%1.8760 
    1.1 0|2063|2016|47|$|adj|ity$|noun|97.72%|100.00% 1|935|926|9|c$|adj|city$|noun|99.04%|45.32% 1|725|707|18|l$|adj|lity$|noun|97.52%|35.14% 6173.68%191450a$|noun|iasis$|noun2013ORG_RULESELF95.77%90.81%1.8658Worse
    1.2 0|2063|2016|47|$|adj|ity$|noun|97.72%|100.00% 2|934|926|8|ic$|adj|icity$|noun|99.14%|45.27% 1|725|707|18|l$|adj|lity$|noun|97.52%|35.14% 6173.68%191450a$|noun|iasis$|noun2013ORG_RULESELF95.77%90.81%1.8658Worse
    2.1 0|1311|954|357|$|noun|al$|adj|72.77%|100.00% 1|666|550|116|n$|noun|nal$|adj|82.58%|50.80% 6462.65%3322081240$|noun|ist$|noun2013ORG_RULESELF95.10%94.16%1.8926Better
    2.2 0|1311|954|357|$|noun|al$|adj|72.77%|100.00% 2|614|526|88|on$|noun|onal$|adj|85.67%|46.83% 6462.65%3322081240$|noun|ist$|noun2013ORG_RULESELF95.17%94.09%1.8926Better
    2.3 0|1311|954|357|$|noun|al$|adj|72.77%|100.00% 3|571|491|80|ion$|noun|ional$|adj|85.99%|43.55% 6560.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.01%94.30%1.8931Better (Best)
    2.4 0|1311|954|357|$|noun|al$|adj|72.77%|100.00% 4|466|402|64|tion$|noun|tional$|adj|86.27%|35.55% 6560.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.04%94.06%1.8910Better
    3.1 0|263|227|36|a$|noun|an$|adj|86.31%|100.00% No candidate child-rule found              Same
    4.1 0|273|1|272|a$|noun|an$|noun|0.37%|100.00% 1|136|1|135|ia$|noun|ian$|noun|0.74%|49.82% 6073.68%191450a$|noun|iasis$|noun2013ORG_RULESELF95.76%91.84%1.8760Same
    5.1 0|138|120|18|a$|noun|ar$|adj|86.96%|100.00% 1|115|105|10|la$|noun|lar$|adj|91.30%|83.33% 6073.68%191450a$|noun|iasis$|noun2013ORG_RULESELF95.78%91.80%1.8758Worse
    5.2 0|138|120|18|a$|noun|ar$|adj|86.96%|100.00% 2|69|65|4|ula$|noun|ular$|adj|94.20%|50.00% 6073.68%191450a$|noun|iasis$|noun2013ORG_RULESELF95.79%91.70%1.8748Worse
    6.1 0|306|303|3|ance$|noun|ant$|adj|99.02%|100.00% No candidate child-rule found              Same
    7.1 0|2429|2410|19|ation$|noun|e$|verb|99.22%|100.00% 1|1007|1006|1|sation$|noun|se$|verb|99.90%|41.46% 1|1222|1222|0|zation$|noun|ze$|verb|100.00%|50.31% 6173.68%191450a$|noun|iasis$|noun2013ORG_RULESELF95.78%91.35%1.8714Worse
    7.2 0|2429|2410|19|ation$|noun|e$|verb|99.22%|100.00% 2|985|985|0|isation$|noun|ise$|verb|100.00%|40.55% 1|1222|1222|0|zation$|noun|ze$|verb|100.00%|50.31% 6173.68%191450a$|noun|iasis$|noun2013ORG_RULESELF95.78%91.30%1.8708Worse
    8.1 0|280|275|5|ency$|noun|ent$|adj|98.21%|100.00% No candidate child-rule found              Same
    9.1 0|1004|1000|4|sis$|noun|tic$|adj|99.60%|100.00% 1|329|327|2|esis$|noun|etic$|adj|99.39%|32.77% 1|366|366|0|osis$|noun|otic$|adj|100.00%|36.45% 1|214|214|0|ysis$|noun|ytic$|adj|100.00%|21.31% 6273.68%191450a$|noun|iasis$|noun2013ORG_RULESELF95.75%91.59%1.8735Worse

From above table, only case-2 provides better results when replacing parent-rule by child-rules. In addition, the case 2.3 has the best system performance of 1.8931 (with system accuracy rate of 95.01% and system coverage rate of 94.30%) and thus is used as our optimized set of SD-Rules for the original SD-rules. The diagram below shows the system accuracy rate vs. system coverage rate. The optimized cutoff point is right around the intersection of these two curves to include 65 (out of 87) SD-rules.