Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Lexical Tools

Add/Evaluate SD-Rules

This session describes how to add/evaluate a Sd-Rule. Once this step is done, an optimized Sd-Rules set needs to be derived as the next step.

  • Sd-Rules sources:
    • Original SD-Rules (done)
    • Derived from high frequency nomD-Pairs
      ln -sf ${NOM_D_DIR}/data/{$YEAR}/data/nomD.yes.data.type.S to nomD.yes.S.data
      shell> cd ${SUFFIX_D_DIR}/bin
      shell> GetSdRule ${YEAR}
      2
      nomD
    • Derived from high frequency original SD-Facts
      ln -sf ../../2013/data/orgD.yes.S.data to ${ORG_D_DIR}/data/${YEAR}/data/orgD.yes.S.data
      shell> cd ${SUFFIX_D_DIR}/bin
      shell> GetSdRule ${YEAR}
      2
      orgFacts
    • Suggested by users, experts, and linguists.
    • derived from WordNet

  • Verify:
    • Test each new rule on the previous SD-Rule set:
      • Set up following data to run:
        • ${SUFFIX_D_DIR}/data/${YEAR}/dataOrg/sdRules.data.${YEAR}
          shell> ${SUFFIX_D_DIR}/bin/GetSuffixD ${YEAR}
          0
        • ${SUFFIX_D_DIR}/data/${YEAR}/dataR/SdRulesCheck/${YEAR}
          shell>mkdir -p ${SUFFIX_D_DIR}/data/${YEAR}/dataR/SdRulesCheck/${YEAR}
      • Check duplication, parents, child rules:
        shell> cd ${SUFFIX_D_DIR}/bin
        shell> GetSdRule {YEAR}
        5
        SD_YEAR (${YEAR})
        TEST_RULE (es$|noun|ic$|noun)
        => Make sure it is:
        		-- It is a root parent-rule (OK)!
        		-- Good: no parents, duplicated, children rules found!
        		
    • Manually add non-duplicated new rules to ./dataOrg/sdRules.data.${YEAR}
      • Verify the updated sdRules.data.${YEAR}
        shell> cd ${SUFFIX_D_DIR}/bin
        shell> GetSdRule {YEAR}
        4
        year
        ${YEAR}
        => Follow the instruction on the screen result to make sure it is OK

        After add new Sd-rules to the SD-Rule set, must run the program to standardize the set:

      • Standardize the Sd-Rule set in lexicographical and alphabetic order
        shell> cd ${SUFFIX_D_DIR}/bin
        shell> GetSdRule {YEAR}
        3
        others
        Sd-Rule file (./data/Org/sdRules.data.${YEAR})

        PS. might need to run step 2 first

    • Get the SD-pairs list:
      • Get sd-pairs for each new sdRules:
        shell> cd ${SUFFIX_D_DIR}/bin
        shell> GetSuffixD ${YEAR}
        10
        suffix-1|cat1|suffix-2|cat2|status|source|relation

        Save ./data/suffixD.tbd.data.option to ./data/newRules

    • Send above SD-pair lists to linguists to tag:
      This step should run after completing tags on new SD candidates from Lexicon updates
      After receiving tgas from linguist, update the ./dataOrg/sdRules.data and ./dataOrg/suffixD.tag.txt then
    • SD-Rules evaluation and Optimization: documents above steps. SD-Rule optimization is required to conduct when new SD-Rules are added to the SD-rule set.

  • Add/Evaluate Sd-Rules log