Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
Results of optimized set - 2021
I. The optimized set
As the result,
we concluded case 35.3 is the final optimized set of SD-Rules in the corpus of Lexicon 2021 to include 104 (out of 148) SD-rules to reach:
This set of SD-rules is used in Lexical Tools SD-Rule Trie because it is expected to reach the same system performance when it is applied to other English corpora under the assumption that:
II. The methodology
This approach is to find the best set of SD-rules from a set of known candidate SD-rules.
Theoretically, a complete set of SD-Rules can be obtained when more SD-rules are evaluated and added. This methodology provides a systematic approach to:
esis$|noun|ic$|adj
is evaluated. It's 3rd generation child rule genesis$|noun|genic$|adj
is selected through method. This is same as linguist's suggests:
III. The target precision and recall rate (95%)
The intersection of curves (optimization) of system precision rate and system recall rate of the final set are at 95%. We also used average values for the window size of 3, 5, 7 rules for these two curves for noise reduction (smoothing algorithm - simple moving average) and find the intersections are all around 95% for all cases (see diagram below). Smoothing this data set allows us to capture the characteristics of this set and leave out noise. Accordingly, our target minimum accuracy rate (95%) is a good choice to obtain the optimized set of SD-rules (close to optimization).
Please refer to the document of generating diagram for optimal set to generate the following diagrams.
System Precision vs. Recall Rate
System Precision vs. Recall Rate, 3-point Avg.
System Precision vs. Recall Rate, 5-point Avg.
System Precision vs. Recall Rate, 7-point Avg.