Lexical Tools

Add/Evaluate SD-Rules

This session describes how to add/evaluate a Sd-Rule. Once this step is done, an optimized Sd-Rules set needs to be derived as the next step.

  • Sd-Rules sources:
    • Original SD-Rules (done)
    • Derived from high frequency nomD-Pairs
      ln -sf ${NOM_D_DIR}/data/{$YEAR}/data/nomD.yes.data.type.S to nomD.yes.S.data
      shell> cd ${SUFFIX_D_DIR}/bin
      shell> GetSdRule ${YEAR}
      2
      nomD
    • Derived from high frequency original SD-Facts
      ln -sf ../../2013/data/orgD.yes.S.data to ${ORG_D_DIR}/data/${YEAR}/data/orgD.yes.S.data
      shell> cd ${SUFFIX_D_DIR}/bin
      shell> GetSdRule ${YEAR}
      2
      orgFacts
    • Suggested by users, experts, and linguists.
    • derived from WordNet

  • Verify:
    • Test each new rule on the previous SD-Rule set:
      • Set up following data to run:
        • ${SUFFIX_D_DIR}/data/${YEAR}/dataOrg/sdRules.data.${YEAR}
          shell> ${SUFFIX_D_DIR}/bin/GetSuffixD ${YEAR}
          0
        • ${SUFFIX_D_DIR}/data/${YEAR}/dataR/SdRulesCheck/${YEAR}
          shell>mkdir -p ${SUFFIX_D_DIR}/data/${YEAR}/dataR/SdRulesCheck/${YEAR}
      • Check duplication, parents, child rules:
        shell> cd ${SUFFIX_D_DIR}/bin
        shell> GetSdRule {YEAR}
        5
        SD_YEAR (${YEAR})
        TEST_RULE (es$|noun|ic$|noun)
        => Make sure it is:
        		-- It is a root parent-rule (OK)!
        		-- Good: no parents, duplicated, children rules found!
        		
    • Manually add non-duplicated new rules to ./dataOrg/sdRules.data.${YEAR}
      • Verify the updated sdRules.data.${YEAR}
        shell> cd ${SUFFIX_D_DIR}/bin
        shell> GetSdRule {YEAR}
        4
        year
        ${YEAR}
        => Follow the instruction on the screen result to make sure it is OK

        After add new Sd-rules to the SD-Rule set, must run the program to standardize the set:

      • Standardize the Sd-Rule set in lexicographical and alphabetic order
        shell> cd ${SUFFIX_D_DIR}/bin
        shell> GetSdRule {YEAR}
        3
        others
        Sd-Rule file (./data/Org/sdRules.data.${YEAR})

        PS. might need to run step 2 first

    • Get the SD-pairs list:
      • Get sd-pairs for each new sdRules:
        shell> cd ${SUFFIX_D_DIR}/bin
        shell> GetSuffixD ${YEAR}
        10
        suffix-1|cat1|suffix-2|cat2|status|source|relation

        Save ./data/suffixD.tbd.data.option to ./data/newRules

    • Send above SD-pair lists to linguists to tag:
      This step should run after completing tags on new SD candidates from Lexicon updates
      After receiving tgas from linguist, update the ./dataOrg/sdRules.data and ./dataOrg/suffixD.tag.txt then
    • SD-Rules evaluation and Optimization: documents above steps. SD-Rule optimization is required to conduct when new SD-Rules are added to the SD-rule set.

  • Add/Evaluate Sd-Rules log