Lexical Tools

Optimizing 2021 SD-Rule Set - Optimum Log

I. Criteria:

  • Total valid SD-Pairs from baseline (parent only rules - baseline) is 54,347
  • Candidate child rules are:
    • Decompose occurrence rate >= 40% (default)
    • Candidate child rules: occurrence rate >= 25% (default)
    • Candidate child rules: precision is decided by the methodology of optimization
    • Find the one with max. precision and recall
    • If the child rule has lower precision and recall than parents, it is not a good candidate even its recall is over 25%!

      Such as in Cases 15 and 16, the system performance is worse because both precision and recall are lower than parents. No need to run the program for these two cases.

      If the performance of a child rule is worse, then the next generation child rules will be worse. No need to run through the following generations (even we still run through them to keep the log completed)!

  • Find the best set by comparing parent vs. child rules:
    • Only apply when child rules precision is better than parent rule
    • Higher system performance
    • If System Performance is the same,
      • Use Precision
      • Use Recall
      • Use Linguistic knowledge

      • Use parent rule to replace child rules.
      • If no parent-child rules involved, use more rules

II. Iterative Optimization Log:

Source:

  • Dir: ${SUFFIX_DIR}/data/${YEAR}/dataR/SdRulesOptimum/*/
  • File: sdRules.stats.out.html

IDRank: Parent-RuleRank: Candidate Child-RulesCutoff SD-Rules
Rank|Accu. Rate|Occr.|Yes|No|TBD|SD-Rule|Precision|Recall|F1|Accu. Yes|Accu. Occu
Notes
0 Rank in Baseline (all Rank)
Parent-rule only - Baseline
Rank
No child-Rule
101|67.01%|97|65|32|0|al$|noun|e$|verb|2020|NOM_D|SELF|95.01%|93.39%|1.8840|50756|53421 baseline
1.1 0|2075|2056|19|$|adj|ally$|adv|99.08%|100.00% 1|1957|1956|1|c$|adj|cally$|adv|99.95%|94.31% 102|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.01%|93.26%|1.8827|50683|53344 Worse
1.2 0|2075|2056|19|$|adj|ally$|adv|99.08%|100.00% 2|1952|1951|1|ic$|adj|ically$|adv|99.95%|94.07% 102|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.01%|93.25%|1.8826|50678|53339 Worse
2.1 0|2085|2041|44|$|adj|ity$|noun|97.89%|100.00% 1|949|942|7|c$|adj|city$|noun|99.26%|45.52% 1|728|712|16|l$|adj|lity$|noun|97.80%|34.92% F1 of child rules are lower than (> 5%) the parents, past evaluation is worse => No need to evaluated Worse
2.2 0|2085|2041|44|$|adj|ity$|noun|97.89%|100.00% 2|948|942|6|ic$|adj|icity$|noun|99.37%|45.47% 1|728|712|16|l$|adj|lity$|noun|97.80%|34.92% F1 of child rules are lower than (> 5%) the parents, past evaluation is worse => No need to evaluated Worse
3.1 0|1326|968|358|$|noun|al$|adj|73.00%|100.00% 1|673|557|116|n$|noun|nal$|adj|82.76%|50.75% F1 of child rules are lower than (> 5%) the parents, past evaluation is worse => No need to evaluated Worse
3.2 0|1326|968|358|$|noun|al$|adj|73.00%|100.00% 2|621|533|88|on$|noun|onal$|adj|85.83%|46.83% F1 of child rules are lower than (> 5%) the parents, past evaluation is worse => No need to evaluated Worse
3.3 0|1326|968|358|$|noun|al$|adj|73.00%|100.00% 3|577|497|80|ion$|noun|ional$|adj|86.14%|43.51% F1 of child rules are lower than (> 5%) the parents, past evaluation is worse => No need to evaluated Worse
3.4 0|1326|968|358|$|noun|al$|adj|73.00%|100.00% 4|472|408|64|tion$|noun|tional$|adj|86.44%|35.60% F1 of child rules are lower than (> 5%) the parents, past evaluation is worse => No need to evaluated Worse
4.1 0|664|343|321|$|noun|y$|noun|51.66%|100.00% 1|253|234|19|h$|noun|hy$|noun|92.49%|38.10% F1 of child rules are lower than (> 5%) the parents, past evaluation is worse => No need to evaluated Worse
5.1 0|573|537|36|$|verb|ion$|noun|93.72%|100.00% 1|449|434|15|t$|verb|tion$|noun|96.66%|78.36% F1 of child rules are lower than (> 5%) the parents, past evaluation is worse => No need to evaluated Worse
5.2 0|573|537|36|$|verb|ion$|noun|93.72%|100.00% 2|322|320|2|ct$|verb|ction$|noun|99.38%|56.20% F1 of child rules are lower than (> 5%) the parents, past evaluation is worse => No need to evaluated Worse
5.3 0|573|537|36|$|verb|ion$|noun|93.72%|100.00% 3|186|186|0|ect$|verb|ection$|noun|100.00%|32.46% F1 of child rules are lower than (> 5%) the parents, past evaluation is worse => No need to evaluated Worse
6.1 0|266|230|36|a$|noun|an$|adj|86.47%|100.00% No candidate child rules found! No candidate child rules found => No need to evaluated Same
7.1 0|278|3|275|a$|noun|an$|noun|1.08%|100.00% 1|138|2|136|ia$|noun|ian$|noun|1.45%|49.64% 101|67.01%|97|65|32|0|al$|noun|e$|verb|2020|NOM_D|SELF|95.01%|93.39%|1.8840|50756|53421 Same
8.1 0|138|121|17|a$|noun|ar$|adj|87.68%|100.00% 1|116|106|10|la$|noun|lar$|adj|91.38%|84.06% The sum of precision and recall of child rules are much lower than the parent, past evaluation is worse => No need to evaluated Worse
8.2 0|138|121|17|a$|noun|ar$|adj|87.68%|100.00% 2|70|66|4|ula$|noun|ular$|adj|94.29%|50.72% The sum of precision and recall of child rules are much lower than the parent, past evaluation is worse => No need to evaluated Worse
9.1 0|2530|2511|19|ation$|noun|e$|verb|99.25%|100.00% 1|1063|1062|1|sation$|noun|se$|verb|99.91%|42.02% 1|1257|1257|0|zation$|noun|ze$|verb|100.00%|49.68% The sum of precision and recall of child rules are much lower than the parent, past evaluation is worse => No need to evaluated Worse
9.2 0|2530|2511|19|ation$|noun|e$|verb|99.25%|100.00% 2|1039|1039|0|isation$|noun|ise$|verb|100.00%|41.07% 2|1250|1250|0|ization$|noun|ize$|verb|100.00%|49.41% The sum of precision and recall of child rules are much lower than the parent, past evaluation is worse => No need to evaluated Worse
9.3 0|2530|2511|19|ation$|noun|e$|verb|99.25%|100.00% 2|1039|1039|0|isation$|noun|ise$|verb|100.00%|41.07% 1|1257|1257|0|zation$|noun|ze$|verb|100.00%|49.68% The sum of precision and recall of child rules are much lower than the parent, past evaluation is worse => No need to evaluated Worse
10.1 0|294|262|32|c$|adj|s$|noun|89.12%|100.00% 1|284|260|24|ic$|adj|is$|noun|91.55%|96.60% 102|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.00%|93.44%|1.8844|50781|53452 Better
10.2 0|294|262|32|c$|adj|s$|noun|89.12%|100.00% 2|192|182|10|tic$|adj|tis$|noun|94.79%|65.31% 102|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.02%|93.29%|1.8832|50703|53360 Worse
10.3 0|294|262|32|c$|adj|s$|noun|89.12%|100.00% 3|174|170|4|itic$|adj|itis$|noun|97.70%|59.18% 102|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.03%|93.27%|1.8830|50691|53342 Worse
11.1 0|504|104|400|c$|adj|sm$|noun|20.63%|100.00% 1|503|104|399|ic$|adj|ism$|noun|20.68%|99.80% 102|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.00%|93.44%|1.8844|50781|53452 Same
12.1 0|883|862|21|ce$|noun|t$|adj|97.62%|100.00% 1|872|862|10|nce$|noun|nt$|adj|98.85%|98.75% 102|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.02%|93.44%|1.8846|50781|53441 Better
12.2 0|883|862|21|ce$|noun|t$|adj|97.62%|100.00% 2|333|330|3|ance$|noun|ant$|adj|99.10%|37.71 2|539|532|7|ence$|noun|ent$|adj|98.70%|61.04 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.02%|93.44%|1.8846|50781|53441 Same
13.1 0|416|411|5|cy$|noun|t$|adj|98.80%|100.00% 1|415|410|5|ncy$|noun|nt$|adj|98.80%|99.76% 102|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.02%|93.44%|1.8846|50780|53440 Same
14.1 0|2349|2321|28|e$|verb|ion$|noun|98.81%|100.00% 1|2212|2204|8|te$|verb|tion$|noun|99.64%|94.17% The sum of precision and recall of child rules are much lower than the parent, past evaluation is worse => No need to evaluated Worse
14.2 0|2349|2321|28|e$|verb|ion$|noun|98.81%|100.00% 2|2108|2104|4|ate$|verb|ation$|noun|99.81%|89.74% The sum of precision and recall of child rules are much lower than the parent, past evaluation is worse => No need to evaluated Worse
14.3 0|2349|2321|28|e$|verb|ion$|noun|98.81%|100.00% 3|602|602|0|late$|verb|lation$|noun|100.00%|25.63% The sum of precision and recall of child rules are much lower than the parent, past evaluation is worse => No need to evaluated Worse
15.1 0|145|138|7|e$|verb|is$|noun|95.17%|100.00% 1|141|138|3|se$|verb|sis$|noun|97.87%|97.24% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.03%|93.46%|1.8849|50919|53582 Better
15.2 0|145|138|7|e$|verb|is$|noun|95.17%|100.00% 2|54|52|2|ose$|verb|osis$|noun|96.30%|37.24% 2|59|59|0|yse$|verb|ysis$|noun|100.00%|40.69% 104|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.03%|93.46%|1.8848|50892|53554 Worse
15.3 0|145|138|7|e$|verb|is$|noun|95.17%|100.00% 2|54|52|2|ose$|verb|osis$|noun|96.30%|37.24% 3|58|58|0|lyse$|verb|lysis$|noun|100.00%|40.00% 104|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.03%|93.46%|1.8848|50891|53553 Worse
16.1 0|224|207|17|esis$|noun|ic$|adj|92.41%|100.00% 1|209|206|3|nesis$|noun|nic$|adj|98.56%|93.30% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.05%|93.46%|1.8851|50918|53567 Better
16.2 0|224|207|17|esis$|noun|ic$|adj|92.41%|100.00% 2|207|206|1|enesis$|noun|enic$|adj|99.52%|92.41% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.06%|93.46%|1.8852|50918|53565 Better
16.3 0|224|207|17|esis$|noun|ic$|adj|92.41%|100.00% 3|207|206|1|genesis$|noun|genic$|adj|99.52%|92.41% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.06%|93.46%|1.8852|50918|53565 Same
16.4 0|224|207|17|esis$|noun|ic$|adj|92.41%|100.00% 4|181|181|0|ogenesis$|noun|ogenic$|adj|100.00%|80.80% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.06%|93.46%|1.8851|50893|53539 Worse
17.1 0|1634|1633|1|ility$|noun|le$|adj|99.94%|100.00% 1|1632|1632|0|bility$|noun|ble$|adj|100.00%|99.88% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.06%|93.46%|1.8852|50917|53563 Same
17.2 0|1634|1633|1|ility$|noun|le$|adj|99.94%|100.00% 2|1294|1294|0|ability$|noun|able$|adj|100.00%|79.19% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.03%|93.07%|1.8810|50579|53225 Worse
18.1 0|1017|1012|5|sis$|noun|tic$|adj|99.51%|100.00% 1|336|334|2|esis$|noun|etic$|adj|99.40%|33.04% 1|369|368|1|osis$|noun|otic$|adj|99.73%|36.28% 1|216|216|0|ysis$|noun|ytic$|adj|100.00%|21.24% The sum of precision and recall of child rules are much lower than the parent, past evaluation is worse => No need to evaluated Worse
19.1 0|101|90|11|sity$|noun|us$|adj|89.11%|100.00% 1|100|90|10|osity$|noun|ous$|adj|90.00%|99.01% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.06%|93.46%|1.8852|50918|53564 Same
20.1 0|60|58|2|sis$|noun|ze$|verb|96.67%|100.00% 1|57|57|0|ysis$|noun|yze$|verb|100.00%|95.00% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.06%|93.46%|1.8852|50917|53562 Same
20.2 0|60|58|2|sis$|noun|ze$|verb|96.67%|100.00% 2|57|57|0|lysis$|noun|lyze$|verb|100.00%|95.00% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.06%|93.46%|1.8852|50917|53562 Same
20.3 0|60|58|2|sis$|noun|ze$|verb|96.67%|100.00% 3|28|28|0|alysis$|noun|alyze$|verb|100.00%|46.67% 3|29|29|0|olysis$|noun|olyze$|verb|100.00%|48.33% 104|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.06%|93.46%|1.8852|50917|53562 Same
24.1 0|71|51|20|$|verb|nce$|noun|71.83%|100.00% 1|71|51|20|e$|verb|ence$|noun|71.83%|100.00% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.06%|93.46%|1.8852|50918|53565 Same
24.2 0|71|51|20|$|verb|nce$|noun|71.83%|100.00% 2|18|16|2|ge$|verb|gence$|noun|88.89%|25.35% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.09%|93.45%|1.8854|50883|53512 Better
28.1 0|44|41|3|d$|verb|sion$|noun|93.18%|100.00% 1|44|41|3|nd$|verb|nsion$|noun|93.18%|100.00% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.09%|93.45%|1.8854|50883|53512 Same
34.1 0|49|46|3|er$|verb|ration$|noun|93.88%|100.00% 1|48|46|2|ter$|verb|tration$|noun|95.83%|97.96% 103|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.09%|93.45%|1.8854|50883|53511 Same
34.2 0|49|46|3|er$|verb|ration$|noun|93.88%|100.00% 2|15|15|0|lter$|verb|ltration$|noun|100.00%|30.61% 2|29|29|0|ster$|verb|stration$|noun|100.00%|59.18% 104|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.09%|93.45%|1.8855|50881|53507 Better
34.3 0|49|46|3|er$|verb|ration$|noun|93.88%|100.00% 2|15|15|0|lter$|verb|ltration$|noun|100.00%|30.61% 3|26|26|0|ister$|verb|istration$|noun|100.00%|53.06% 104|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.09%|93.45%|1.8855|50878|53504 Same
34.4 0|49|46|3|er$|verb|ration$|noun|93.88%|100.00% 2|15|15|0|lter$|verb|ltration$|noun|100.00%|30.61% 4|14|14|0|gister$|verb|gistration$|noun|100.00%|28.57% 104|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.09%|93.45%|1.8854|50866|53492 Worse
35.1 0|63|43|20|ity$|noun|y$|adj|68.25%|100.00% 1|57|42|15|rity$|noun|ry$|adj|73.68%|90.48% 104|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.10%|93.45%|1.8855|50880|53501 Same
35.2 0|63|43|20|ity$|noun|y$|adj|68.25%|100.00% 2|55|42|13|arity$|noun|ary$|adj|76.36%|87.30% 104|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.10%|93.45%|1.8856|50880|53499 Better
35.3 0|63|43|20|ity$|noun|y$|adj|68.25%|100.00% 3|21|19|2|narity$|noun|nary$|adj|90.48%|33.33% 104|65.85%|41|27|14|0|ctic$|adj|xis$|noun|2021|ORG_FACT|SELF|95.12%|93.45%|1.8857|50857|53465 Best