Lexical Tools

Example - Add SD-Rules Derived from factD

The original Lexical Tools collects 4,467 SD-pairs with 4,110 suffix SD-pairs. These SD-pairs can be used to derive possible SD-rules by following the same approach in the nomD session:

  • Identifies possible SD-Rules by stripping the same starting characters of each valid SD-pair generated from factD.
  • Select high frequency SD-rules to add to SD-rules set:
    Possible SD-rule from factDRootRelatedNotes
    $|noun|less$|adj|131|131YesNoneSelected
    $|verb|$ion|noun|111|111YesDuplicatedNot selected
    ist$|noun|y$|noun|63|63YesNoneSelected
    $|adj|ally$|adv|58|58
    => ic$|adj|ically$|adv is used instead
    => need to verify the root stats
    YesNoneSelected
    $|noun|ful$|adj|58|58YesNoneSelected
    c$|adj|s$|noun|54|54
    => ic$|adj|is$|noun is used instead
    => need to verify the root stats
    YesNoneSelected
    on$|noun|ve$|adj|38|38YesNoneNot selected due to low frequency (coverage)
    .........Not selected due to low frequency (coverage)

  • Apply the same procedures to get the optimized set as in add SD-rules from nomD session by using the optimized set of 2.3.4 as new baseline. This task involves:
    • Retrieve all raw SD-pairs from Lexicon (2013) of above four selected SD-rules
    • Tag raw SD-pairs
    • Get stats of SD-pairs of these four SD-rules
    • Add to SD-rules set and find the optimization
    • The total valid SD-Pair no. (TotalYes) needs to be calculated as total valid SD-pair no. from all parent-rules.

    The iterative results are shown as follows:

    IDNew Candidate RuleTotal YesTotal Rule No.Rule No.A. RateOccr.YesNoTbdSD-RuleStatusSourceNotesSys A. RateSys C. RateSys. PerfNotes
    2.3.4
    (prev. optimized set)
      39,197906860.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.05%94.60%1.8965Baseline
    2.3.4.1 12|99.95%|1931|1930|1|0|ic$|adj|ically$|adv|2013|ORG_FACT|SELF 41,127 =
    39,197 + 1930
    916960.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.28%94.85%1.9013Better
    2.3.4.2 15|99.64%|559|557|2|0|$|noun|less$|adj|2013|ORG_FACT|SELF 41,684 =
    41,127 + 557
    927060.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.34%94.92%1.9026Better
    2.3.4.3 40|95.63%|504|482|22|0|ist$|noun|y$|noun|2013|ORG_FACT|SELF 42,166 =
    41,684 + 482
    937160.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.35%94.98%1.9032Better
    2.3.4.4 49|91.70%|277|254|23|0|ic$|adj|is$|noun|2013|ORG_FACT|SELF 42,420 =
    42,166 + 254
    947260.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.32%95.01%1.9033Better
    2.3.4.5 55|89.93%|139|125|14|0|$|noun|ful$|adj|2013|ORG_FACT|SELF 42,545 =
    42,420 + 125
    957360.66%183111720ar$|adj|e$|noun2013ORG_RULESELF95.30%95.02%1.9033Best

From above results, all five selected SD-rules (with the highest frequency and precision from factD) improved the system performance. Thus, all these five SD-rles are added to the SD-rule set. Please note that SD-rule ic$|adj|ically$|adv and ic$|adj|is$|noun are suggested SD-rules from their root parent-rule $|adj|ally$|adv and c$|adj|s$|noun, respectively. Both of root parent-rules should be re-evaluated by this system.

The table above shows the iterative results by adding new rules derived from factD step by step. The results show all five selected SD-rules (with the highest frequency from factD) improve the system performance. Thus, all these five SD-rules are added to the SD-rule set to reach better coverage rate (95.02%) and system performance (1.9033) with accuracy rate of 95.30% to include 73 (out of 95) SD-rule in the optimized set. The diagram below shows the system accuracy and coverage curves of this optimized set.