Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
SD-Rules Optimization Goal
In order to reach the best system performance (accuracy rate and coverage rate), a systematic approach need to be developed to:
Frist, the collected SD-Rules set need to be refined and optimized to reach best accuracy and coverage rate in the following cases:
able$|adj|ability$|noun
is normalized to ability$|noun|able$|adj
a$|noun|an$|noun
has only 1 valid SD-Pair out of 273 raw SD-Pairs from Lexicon. The accuracy rate is 0.37% with only 1 valid instance (coverage). Such SD-Rule is expected to generate more invalid SD-pairs than valid one if it is applied to general English (outside the Lexicon) and thus should be removed from the optimized set.
a$|noun|an$|noun
is the parent-rule of ia$|noun|ian$|noun
. In other words, a parent-rule covers all SD-Pairs generated by its child-rules. There are two ways of refinement:
sis$|noun|tic$|adj
has two child-rules osis$|noun|otic$|adj
and esis$|noun|etic$|adj
form the original SD-Rules set. Both child rules need to be evaluated by comparing the system performance to the parent rules. Child-rules should be included only if they are good rules (have good accuracy and coverage rate, see next session for details on the evaluation procedures).
Optimization Goal
The goal is to find a good set of rules from known SD-rules (by removing bad rules) that have the best system performance to reach following criteria:
System performance
If we arrange all SD-Rules by the following order:
, where the system accuracy and coverage rate can be defined as: