Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Comparison on Optimized Set on 2014, 2015, and 2016 (TBD)

I. New SD-Rules Evaluation Results:

Three releases applied this approach to retrieve the optimized SD-rule set.

  • 2014 release: it is the first release appling ths approach to retrieve the optimized set (based on 2013 SD-Rule).
  • 2015 release: 15 new SD-Rules are added to the 2014 release for evaluation.
    • Total candidates SD-pairs: 53,905
    • Total valid candidates SD-pairs (SD-Facts: relevant): 46,950

    • 2 are duplicated (child rule of existing rules).
    • 11 (84.62%, 11/13) of them are evaluated as good rules in the optimized set
    • 2 (15.38%, 2/13) are bad rules

    • In the optimized set, 2 child rules are used to replace proposed rules.
    • Details:
      SD-RuleRankPrecisionInstancesSourceDecomposeResults
      Duplicated Rules
      ian$|adj|ia$|noun5786.31%263Suggestions1-G ChildDuplicated of good parent-rule an$|adj|a$|noun
      ian$|noun|ia$|noun990.36%274Suggestions1-G ChildDuplicated of bad parent-rule an$|noun|a$|noun
      Good Rules
      se$|verb|zation$|noun2100.00%1108NOM_DRoot-ParentGood SD-Rule
      sation$|noun|ze$|verb3100.00%1071NOM_DRoot-ParentGood SD-Rule
      ility$|noun|le$|adj999.94%1625NOM_DRoot-ParentGood SD-Rule
      $|adj|ally$|adv1599.08%2072ORG_DRoot-ParentGood SD-Rule
      nce$|noun|nt$|adj1898.82%847NOM_D1G-ChildGood SD-Rule
      cy$|noun|t$|adj1998.77%406NOM_DRoot-parentGood SD-Rule
      e$|verb|ion$|noun2098.76%2336NOM_DRoot-parentGood SD-Rule
      ic$|adj|is$|noun4391.46%281ORG_D1G-Child Good SD-Rule
      e$|verb|ing$|noun4591.43%210SuggestionsRoot-parentGood SD-Rule
      al$|adj|us$|noun6184.35%262SuggestionsRoot-parentGood SD-Rule
      es$|noun|ic$|adj6773.91%23SuggestionsRoot-parentGood SD-Rule
      Bad Rules
      $|noun|ize$|verb7859.05%442SuggestionsRoot-parentBad SD-Rule
      es$|noun|ic$|noun1010.00%19SuggestionsRoot-parentBad SD-Rule

  • 2016 release: 12 new SD-Rules are added to the 2015 release for evaluation.
    • Total candidates SD-pairs: 58,422
    • Total valid candidates SD-pairs: 50,814

    • 1 are duplicated (of existing rules).
    • 8 (72.73%, 8/11) of them are evaluated as good rules in the optimized set
    • 3 (27.27%, 3/11) are bad rules

    • In the optimized set, 2 child rules are used to replace proposed rules.
    • Details:
      SD-RuleRankPrecisionInstancesSourceDecomposeResults
      Duplicated Rules
      e$|verb|ing$|noun4791.47%211NOM_DRoot-ParentDuplicated of a good rule
      Good Rules
      genesis$|noun|genic$|adj1399.52%207EXP_SUG3G-ChildGood SD-Rule
      se$|verb|sis$|noun2797.87%141NOM_D1G-ChildGood SD-Rule
      sia$|noun|tic$|adj4094.17%103ORG_DRoot-ParentGood SD-Rule
      on$|noun|ve$|adj4891.46%1253ORG_DRoot-ParentGood SD-Rule
      e$|noun|ic$|adj4991.40%1267ORG_DRoot-ParentGood SD-Rule
      $|adj|ism$|noun5190.79%369NOM_DRoot-ParentGood SD-Rule
      ation$|noun|ed$|adj6783.95%405NOM_DRoot-ParentGood SD-Rule
      $|noun|ship$|noun7080.45%133ORG_DRoot-ParentGood SD-Rule
      Bad Rules
      e$|adj|ion$|noun8854.60%359NOM_DRoot-ParentBad SD-Rule
      $|noun|age$|noun9636.97%119ORG_DRoot-ParentBad SD-Rule
      al$|adj|ine$|noun9832.65%49EXP_SUGRoot-ParentBad SD-Rule

II. Comparison of SD-Rule set:

Item201420152016
Baseline Set
Include parent-child rules
107 120 132
Total Unique Rules96101111
Total Good Rules737682
Total Valid SD-pairs (SD-Facts: Relevant)42,55246,95050,814
Opti. System Precision95.30%95.2295.00%
Opti. System Recall95.01%95.70%95.26%
Opti. System Performance1.90311.90931.9026
Cufoff Rulear$|adj|e$|noun ar$|adj|e$|noun $|noun|ist$|noun
Optimized Set 2014 Optimized Set 2015 Optimized Set 2016 Optimized Set
Optimized Diagram

For the Optimial set:

  • The optimized set is similar between releases of 2014 and 2015, please see SD-Rule rank mapping, 2014-15 for details.
  • 2014 optimal set has 96 SD-Rules, 73 of them are good.
  • 2015 optimal set has 101 SD-Rules, 76 of them are good.
  • 2016 optimal set has 111 SD-Rules, 82 of them are good.
  • All good rules in 14 are in 15.
  • All good rules in 15 are in 16, except for 1 (ar$|adj|e$|noun).

III. Transaction History:

Baseline
Collected Candidate SD-Rules
Unique Rules
Remove child-rules from Baseline
Good Rules
Used in Lexical Tools SD-Rule set
2014107 96
  • removed 11 child-rules from baseline
  • 96 = 107 - 11
73
New Rules15
  • ES (Expert-Suggest)NOM_DORG_DSub-Total
    Total Rules76215
    Duplicated2002
    Total non-dul-rules56213
    Bad Rules2002
    Good Rules36211
  • details
2015120
  • 2 new rules out of 15 are child-rules of existing rules, not added
  • 120 = 107 + 15 - 2
101 76
  • 4 of good new rules are parent-rules of 4 existing rules (+0)
  • 2 of good new rules are parent-rules of 4 existing rules (-2)
  • 5 of good new rules have no parent-rules relationship with existing rule (+5)
  • 76 = 73 + 0 - 2 + 5
New Rules12
  • ES (Expert-Suggest)NOM_DORG_DSub-Total
    Total Rules25512
    Duplicated0101
    Total non-dup-rules24511
    Bad Rules1113
    Good Rules1348
  • details
2016132
  • 1 existing rule add child-rule nce$|noun|nt$|adj in 2015s
  • 1 new rules of out 12 is duplicatedm not added
  • 132 = 120 + 1 + 12 -1
111 82

Details:

The conclusion is the optimized set of SD-Rules is very steady as we expected.