Lexical Tools

Comparison on Optimized Set on 2014, 2015, and 2016 (TBD)

I. New SD-Rules Evaluation Results:

Three releases applied this approach to retrieve the optimized SD-rule set.

  • 2014 release: it is the first release appling ths approach to retrieve the optimized set (based on 2013 SD-Rule).
  • 2015 release: 15 new SD-Rules are added to the 2014 release for evaluation.
    • Total candidates SD-pairs: 53,905
    • Total valid candidates SD-pairs (SD-Facts: relevant): 46,950

    • 2 are duplicated (child rule of existing rules).
    • 11 (84.62%, 11/13) of them are evaluated as good rules in the optimized set
    • 2 (15.38%, 2/13) are bad rules

    • In the optimized set, 2 child rules are used to replace proposed rules.
    • Details:
      SD-RuleRankPrecisionInstancesSourceDecomposeResults
      Duplicated Rules
      ian$|adj|ia$|noun5786.31%263Suggestions1-G ChildDuplicated of good parent-rule an$|adj|a$|noun
      ian$|noun|ia$|noun990.36%274Suggestions1-G ChildDuplicated of bad parent-rule an$|noun|a$|noun
      Good Rules
      se$|verb|zation$|noun2100.00%1108NOM_DRoot-ParentGood SD-Rule
      sation$|noun|ze$|verb3100.00%1071NOM_DRoot-ParentGood SD-Rule
      ility$|noun|le$|adj999.94%1625NOM_DRoot-ParentGood SD-Rule
      $|adj|ally$|adv1599.08%2072ORG_DRoot-ParentGood SD-Rule
      nce$|noun|nt$|adj1898.82%847NOM_D1G-ChildGood SD-Rule
      cy$|noun|t$|adj1998.77%406NOM_DRoot-parentGood SD-Rule
      e$|verb|ion$|noun2098.76%2336NOM_DRoot-parentGood SD-Rule
      ic$|adj|is$|noun4391.46%281ORG_D1G-Child Good SD-Rule
      e$|verb|ing$|noun4591.43%210SuggestionsRoot-parentGood SD-Rule
      al$|adj|us$|noun6184.35%262SuggestionsRoot-parentGood SD-Rule
      es$|noun|ic$|adj6773.91%23SuggestionsRoot-parentGood SD-Rule
      Bad Rules
      $|noun|ize$|verb7859.05%442SuggestionsRoot-parentBad SD-Rule
      es$|noun|ic$|noun1010.00%19SuggestionsRoot-parentBad SD-Rule

  • 2016 release: 12 new SD-Rules are added to the 2015 release for evaluation.
    • Total candidates SD-pairs: 58,422
    • Total valid candidates SD-pairs: 50,814

    • 1 are duplicated (of existing rules).
    • 8 (72.73%, 8/11) of them are evaluated as good rules in the optimized set
    • 3 (27.27%, 3/11) are bad rules

    • In the optimized set, 2 child rules are used to replace proposed rules.
    • Details:
      SD-RuleRankPrecisionInstancesSourceDecomposeResults
      Duplicated Rules
      e$|verb|ing$|noun4791.47%211NOM_DRoot-ParentDuplicated of a good rule
      Good Rules
      genesis$|noun|genic$|adj1399.52%207EXP_SUG3G-ChildGood SD-Rule
      se$|verb|sis$|noun2797.87%141NOM_D1G-ChildGood SD-Rule
      sia$|noun|tic$|adj4094.17%103ORG_DRoot-ParentGood SD-Rule
      on$|noun|ve$|adj4891.46%1253ORG_DRoot-ParentGood SD-Rule
      e$|noun|ic$|adj4991.40%1267ORG_DRoot-ParentGood SD-Rule
      $|adj|ism$|noun5190.79%369NOM_DRoot-ParentGood SD-Rule
      ation$|noun|ed$|adj6783.95%405NOM_DRoot-ParentGood SD-Rule
      $|noun|ship$|noun7080.45%133ORG_DRoot-ParentGood SD-Rule
      Bad Rules
      e$|adj|ion$|noun8854.60%359NOM_DRoot-ParentBad SD-Rule
      $|noun|age$|noun9636.97%119ORG_DRoot-ParentBad SD-Rule
      al$|adj|ine$|noun9832.65%49EXP_SUGRoot-ParentBad SD-Rule

II. Comparison of SD-Rule set:

Item201420152016
Baseline Set
Include parent-child rules
107 120 132
Total Unique Rules96101111
Total Good Rules737682
Total Valid SD-pairs (SD-Facts: Relevant)42,55246,95050,814
Opti. System Precision95.30%95.2295.00%
Opti. System Recall95.01%95.70%95.26%
Opti. System Performance1.90311.90931.9026
Cufoff Rulear$|adj|e$|noun ar$|adj|e$|noun $|noun|ist$|noun
Optimized Set 2014 Optimized Set 2015 Optimized Set 2016 Optimized Set
Optimized Diagram

For the Optimial set:

  • The optimized set is similar between releases of 2014 and 2015, please see SD-Rule rank mapping, 2014-15 for details.
  • 2014 optimal set has 96 SD-Rules, 73 of them are good.
  • 2015 optimal set has 101 SD-Rules, 76 of them are good.
  • 2016 optimal set has 111 SD-Rules, 82 of them are good.
  • All good rules in 14 are in 15.
  • All good rules in 15 are in 16, except for 1 (ar$|adj|e$|noun).

III. Transaction History:

Baseline
Collected Candidate SD-Rules
Unique Rules
Remove child-rules from Baseline
Good Rules
Used in Lexical Tools SD-Rule set
2014107 96
  • removed 11 child-rules from baseline
  • 96 = 107 - 11
73
New Rules15
  • ES (Expert-Suggest)NOM_DORG_DSub-Total
    Total Rules76215
    Duplicated2002
    Total non-dul-rules56213
    Bad Rules2002
    Good Rules36211
  • details
2015120
  • 2 new rules out of 15 are child-rules of existing rules, not added
  • 120 = 107 + 15 - 2
101 76
  • 4 of good new rules are parent-rules of 4 existing rules (+0)
  • 2 of good new rules are parent-rules of 4 existing rules (-2)
  • 5 of good new rules have no parent-rules relationship with existing rule (+5)
  • 76 = 73 + 0 - 2 + 5
New Rules12
  • ES (Expert-Suggest)NOM_DORG_DSub-Total
    Total Rules25512
    Duplicated0101
    Total non-dup-rules24511
    Bad Rules1113
    Good Rules1348
  • details
2016132
  • 1 existing rule add child-rule nce$|noun|nt$|adj in 2015s
  • 1 new rules of out 12 is duplicatedm not added
  • 132 = 120 + 1 + 12 -1
111 82

Details:

The conclusion is the optimized set of SD-Rules is very steady as we expected.