Lexical Tools

Retrieve New SD-Rules from OrgD, 2020

I. Description

A set of computer programs (FindSdRulesFromDPairs.java) are developed to find the SD-Rules from a set of suffixD pairs. It identifies and eliminates the same starting characters of a SD-pair and then generates the SD-Rules automatically. Please notes that only root-parent SD-Rules is generated in this program. Two sets of SD-pairs are used for this task. This page details the new SD-Rules selected from orgD.

II. Procedures

  • Directory: ${SUFFIXD_DIR}
  • Programs:
    • shell>cd ${SUFFIXD_DIR}/bin
    • shell>GetSdRule ${YEAR}
      2
      orgFacts

    • shell>GetSdRule ${YEAR}
      3
      orgFacts

    • shell>GetSdRule ${YEAR}
      5
      year
      rule

    III. Results

    • From: ./dataR/SdRulesFromSdPairs/orgFacts/sdRulesFromSdPairs.rpt
    • These are SD-Pairs from orgD (Facts) in Lexicon.2013- (2014+ has added new rules and thus eliminated some candidate rules)
    • Once new SD-Rules derived from orgD are added to the system, the suffixD form orgD will be decreased because they are already generated in suffixD. So, 2013 release is used for this study.
    • The file of ${ORG_D_DIR}/data/orgD.yes.S.data is used as input
    • There are 4,110 SD-Pairs to generate 1,421 SD-Rules
    • All generated SD-Rules are root-parent rules (without parent-rule).
    • Rules with following criteria are selected:
      • 2015:
        • High frequency: (>= 40)
        • Accumulate coverage: 11.56% (> 11.50%)
        • Individual coverage: 1.31% (> 1.00%)
      • 2016:
        • High frequency: (>= 35)
        • Accumulate coverage: 15.94% (> 15.00%)
        • Individual coverage: 0.85% (> 0.80%)
      • 2017:
        • High frequency: (>= 30)
        • Accumulate coverage: 19.00% (> 19.00%)
        • Individual coverage: 0.73% (> 0.70%)
      • 2020:
        • High frequency: (>= 25)
        • Accumulate coverage: 22.77% (> 22.00%)
        • Individual coverage: 0.61% (> 0.60%)

      • SD-Rules meet above criteria (total instance No. 4,110):
        SD-RulesInstances No.Accu. No.Notes
        less$|adj|$|noun131 (3.19%)131 (3.19%)2013-, exists
        $|verb|ion$|noun111 (2.70%)242 (5.89%)2013-, exists
        ist$|noun|y$|noun63 (1.53%)305 (7.42%)2013-, exists
        ally$|adv|$|adj58 (1.41%)363 (8.83%)2015, with existing child rules
        ful$|adj|$|noun58 (1.41%)421 (10.24%)2013-, exists
        c$|adj|s$|noun54 (1.31%)475 (11.56%)2015, with existing child rules
        2015: frequency >= 40; Accu. coverage: > 11.50% Ind Coverage: > 1.00%
        ve$|adj|on$|noun38 (0.92%)513 (12.48%)2016, no child rules
        ship$|noun|$|noun37 (0.90%)550 (13.38%)2016, no child rules
        age$|noun|$|noun35 (0.85%)585 (14.23%)2016, no child rules
        ic$|adj|e$|noun35 (0.85%)620 (15.09%)2016, no child rules
        tic$|adj|sia$|noun35 (0.85%)655 (15.94%)2016, no child rules
        2016: frequency >= 35; Accu. coverage: > 16.00% Ind Coverage: > 0.80%
        fully$|adv|$|noun32 (0.78%)687 (16.72%)2017, no child rules
        ish$|adj|$|noun32 (0.78%)719 (17.49%)2017, no child rules
        y$|noun|$|noun32 (0.78%)751 (18.27%)2017, no child rules
        tous$|adj|$|noun30 (0.73%)781 (19.00%)2017, no child rules
        2017: frequency >= 30; Accu. coverage: > 19.00% Ind Coverage: > 0.73%
        ory$|adj|ion$|noun27 (0.66%)808 (19.66%)2020, no child rules
        $|adj|s$|noun26 (0.63%)834 (20.29%)2020, no child rules
        ial$|adj|$|noun26 (0.63%)860 (20.92%)2020, no child rules
        ic$|adj|es$|noun26 (0.63%)886 (21.56%)2015, exists
        age$|noun|$|verb25 (0.61%)911 (22.17%)2020, no child rules
        e$|verb|ion$|noun25 (0.61%)936 (22.77%)2015, exists
        2020: frequency >= 25; Accu. coverage: > 22.77% Ind Coverage: > 0.61%

      • New SD-Rules without childred rules
        • ory$|adj|ion$|noun
        • $|adj|s$|noun
        • ial$|adj|$|noun
        • age$|noun|$|verb