Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Derivations Procedures - suffixD

Generate suffixD pairs in derivation table:

I. Directory: ${DERIVATION}/3.suffixD

II. Input Files (./data/${YEAR}/dataOrg/):
shell> ${SUFFIX_D}/bin/GetSuffixD ${YEAR}
0

  • link LEXICON to LEXICON.${YEAR} (from ${LEXICON_DIR}/LEXICON.release)
  • link inflVars.data to inflVars.data.${YEAR} (from ${LEXICON_DIR})
  • link bases.data from prefixD/data/ (Complete step-1 in prefixD first)
  • link sdRules.data to sdRules.data.${YEAR} (from ${PREV_YEAR} and new rules)
  • link suffixD.tag.txt to suffixD.tag.txt.${YEAR} (copy from previous year)

  • touch/create suffixD.meta.data.conflict.tag.data for init phase

  • Must complete nomD first for auto-tag program to work
  • Must run prefixD step-1 first to get bases.data

III. Final file for allD (release)

  • ${TAR_DIR}/suffixD.yes.data.${YEAR}

IV. Summary of GetSuffixD

StepDescription and ProgramInputOutputNotes
0
  • Prepare directories and files
See section II.See section II.
  • 3.suffixD/data/${YEAR}/dataOrg
    • LEXICON
    • inflVars.data
    • bases.data
    • sdRules.data
    • suffixD.tag.txt
1
  • Retrieve std-raw suffixD pairs
  • GetSuffixDRawFromBaseFile.java
  • ${SRC_DIR}:
    • bases.data
    • sdRules.data
  • suffixD.raw.data.fromBase.all
  • sdRules.rawNo.rpt
  • Must complete prefixD Step-1 to get bases.data
  • Need to rerun from this step if there are new Sd-Rules invloved
    • Add new SD-Rules to ./dataOrg/sdRules.data.${YEAR}
    • Get sd-pair (TBD) for each new sdRules
    • Send TBD to linguist to tag [yes|no] from the following steps
    • Save new tag result to ./dataOrg/newRuleTag/
    • Add new tag result to ./dataOrg/suffixD.tag.txt.${YEAR}
2
  • Combine with nomD.S file (raw)
  • CheckWithNomDFile.java
  • ${NOM_TAR_DIR}:
    • nomD.yes.S.data.${YEAR}

  • ${TAR_DIR}:
    • suffixD.raw.data.fromBase
  • suffixD.raw.data.fromNomD
  • suffixD.raw.data
  • Must link suffixD.raw.data.fromBase to suffixD.raw.data.fromBase.all to run this step
3
  • Add tags to suffixD meta file
  • GetSuffixDMetaFile.java
  • DPairTagList.java
  • ${NOM_TAR_DIR}:
    • nomD.yes.S.data.${YEAR}

  • ${SRC_DIR}:
    • suffixD.tag.txt (suffixD.tag.txt.${YEAR}.uSort)

  • ${TAR_DIR}:
    • suffixD.raw.data
  • suffixD.meta.data
  • suffixD.meta.data.conflict

  • 1. Read and fix sdPair tags from tag file
    • Remove duplicat and conflict tags from ./dataOrg/suffixD.tag.txt
    • use uSort (shell> sort -u suffixD.tag.txt > suffixD.tag.txt.usort)
    • => after uSort, duplicated tag no. should = conflict tag no (duplicate are removed by sort -u).
    • go through the duplicated tag no and conflict tag no to fix them until both should be fixed to 0
    • conflict tag (different tag): need to be fixed, send to linguist to re-tag.
  • 2. Read and add sdPair tags from nomD file
    • Ignore the long list of duplicated tags (between manual tags and normD tags) in the log.3
    • Check and fix the Total conflict tag no (conflict between nomD and expert's tag)
  • 3. Verify and fix conflict tags from spVars
    • The file (suffixD.meta.data.conflict) are suffixD tag conflict caused by SpVar between two records
    • Ideally, all suffixD tag should be consistent among SpVars between records
    • In the 1st run (before add tags to annually updates), no conflict should exist. That is to skip Step-9, go to Step-4 for the 1st run.
    • The suffixD.meta.data.conflict should be empty (except for 1 known exception)
    • There is a known exception (since 2014+):
      1|E0056852|E0234312|both
      # 20092|space|noun|E0056852|spacey|adj|E0234312|no
      # 38379|space|noun|E0056852|spacy|adj|E0234312|yes
      

      => This known exception is corrected in 2023+ and change to yes.
    • If not empty, sent to linguists to tag (yes|no|both) on the EUI lines:
      • yes: all suffixD tags among SpVars between records are valid
      • no: all suffixD tags among SpVars between records are invalid
      • both: suffixD tags among SpVars between records inlcude valid and invalid (exception)
    • Run the next step (9) to resolve conflicts and update the results to suffixD.tag.txt automatically, then re-run this Step: 3 until all exception are known
    • make sure:
      • Empty line no = 0
      • Invalid tag no = 0
        These are new suffix dPairs needs to be tagged (handled in Step-4).
      • conflict (yes|no) tag no = 0
      • none (tbd) tag no = 0
    • If all conflict exceptions are known (fixed), go to step-4
9
  • Auto-fix suffixD.tag.txt
  • FixConflictDPairTags.java
  • ${SRC_DIR}:
    • suffixD.tag.txt.${YEAR}
    • suffixD.meta.data.conflict.tag.data
    ${SRC_DIR}
  • suffixD.tag.txt.${YEAR}.fixDPair
  • Make sure use linguist tagging result to ./dataOrg/suffixD.meta.data.conflict.tag.data
  • Manully exam ./dataOrg/suffixD.tag.txt.${YEAR}.fixDPair
  • If suffixD.tag.txt.${YEAR}.fixDPair passes exam, move it to suffixD.tag.txt.${YEAR}, then re-run Step-3 again.
4
  • Split suffixD meta file (yes|no|tbd)
  • SplitSuffixDMetaFile.java
  • ${TAR_DIR}:
    • suffixD.meta.data
  • suffixD.yes.data
  • suffixD.no.data
  • suffixD.tbd.data
  • suffixD.tbd.data.sort (sent to linguists)
  • suffixD.yesNo.data
  • Make sure suffixD.tbd.data(.sort) is empty. If not, sent to linguists to tag:
    • Tag suffixD: (yes|no)
      • valid suffixD: yes
      • invalid suffixD: no
  • Append (update) these new tagged sd-pairs (to ./dataOrg/suffixD.tag.txt) and rerun steps: 3~4
    • add [tbd] if tags are missing to pass step-3.
4a
  • Clean up tags on tagged file
  • CleanUpDPairTagList.java
  • ${SRC_DIR}:
    • suffixD.tag.txt
  • ${SRC_DIR}:
    • suffixD.tag.txt.cleanUp
Re-run this step until:
Go to the end of the log.4a file
  • duplicate = 0 If not, replace suffixD.tag.txt with suffixD.tag.txt.cleanUp
  • conflict = 0 If not, send conflict (from log.5a) to linguists to re-tag. Do NOT replace suffixD.tbd.data with suffixD.tbd.data.cleanUp until conflict = 0
  • diff = 0 If not, replace suffixD.tbd.data with suffixD.tbd.data.cleanUp
  • Then, rerun Steps: 3~4 until it is empty and pass all steps in Steps 3~4.
5
  • Verify dType on suffixD.yes.data
  • DType.java
  • ${ALL_SRC_DIR}:
    • LRSPL
    • dTypeStr.data

  • ${TAR_DIR}:
    • suffixD.yes.data
  • suffixD.yes.data.type
  • suffixD.yes.data.type.Z
  • suffixD.yes.data.type.S
  • suffixD.yes.data.type.P
  • suffixD.yes.data.type.ZS
  • suffixD.yes.data.type.SS
  • suffixD.yes.data.type.PS
  • suffixD.yes.data.type.U
  • Make sure unknonw dType (|U|) from suffixD is empty
  • Must finish all new SD-rules (if any) before proceed this step
6
  • Automatically add negation tag [N|O], ~less$ is [N], others are [O]
    then sort uniquely
  • AddNegationTagToFile.java
  • DPairTagList.java
  • ${TAR_DIR}:
    • suffixD.yes.data
  • suffixD.yes.data.${YEAR}
  • suffixD.yes.data.${YEAR}.conflict
  • The conflict file (suffixD.yes.data.${YEAR}.conflict) lists all inconsistnent suffixD tags between SpVars in two records
    • Send conflicts to linguist to tag (N|O|B) on EUI lines
    • In the past, no both cases in suffixD
    • Manually update the results to suffixD.tag.txt
    • Rerun Steps: 3~6 until no unknown conflict (both) exist.
7
  • Check afflix on suffixD.yes.data.${YEAR}
  • CheckDerivationByAffix6.java
  • ${ALL_SRC_DIR}:
    • LRSPL

  • ${SRC_DIR}:
    • suffixD.tagYes.txt

  • ${TAR_DIR}:
    • suffixD.yes.data.${YEAR}
  • suffixD.pattern3.rpt
  • copy ${SRC_DIR}/suffixD.tagYes.txt.${PREV_YEAR} ${SRC_DIR}/suffixD.tagYes.txt.${YEAR}
  • suffixD.pattern3.rpt must be empty.
  • This rpt lists all potential invalid dPair by checking 1st and last 3 characters on afflix.
  • If not, send to linguists to tag (Yes|No):
    • invalid dPair (No): add to suffixD.tagNo.txt (no used!), This should not happen!
    • valid dPair (Yes): add to suffixD.tagYes.txt, then rerun Step: 7
8
  • Steps 1 ~ 7
See aboveSee aboveNot recomended!
Other options
11
  • Get stats for SD-rule
    ALL
  • GetSdRuleStatsFromTaggedSuffixD.java
  • ${SRC_DIR}:
    • sdRules.data
  • ${TAR_DIR}:
    • suffixD.meta.data
  • sdRules.stats.rpt
  • sdRules.stats.detail.rpt
Only Use for LVG SD-Rules
  • Used for analysis in finding the optimal Sd-Rules set, please refer to the design documents (SD-Rules evaluation/optimization) of Lexical Tools
12
  • Get the HTML files
    ALL
  • GetSdRuleListHtmlFile.java
  • ${SRC_DIR}:
    • sdRules.data
  • ${TAR_DIR}:
    • suffixD.meta.data
  • ${HTML_DIR}:
    • suffixDRules.html
    • SD-Examples
    • SD-Exceptions
Copy to ${LEXICON_WEB} for annually Sd-Rules updates
  • SD-Examples
  • SD-Exceptions
  • suffixDRules.html

V. Processes Details:

  • shell>cd ${DERIVATION}/suffixD/bin
  • shell>GetSuffixD ${YEAR}

    1: Retrieve std-raw suffixD pairs or
    => generate:

    • ./data/sdRules.rawNo.rpt
    • ./data/suffixD.raw.data.fromBase.all

    2: Check/integrate with nomD.S file (raw)
    => ln -s ./suffixD.raw.data.fromBase.all suffixD.raw.data.fromBase
    => generate:

    • ./data/suffixD.raw.data.fromNomD
    • ./data/suffixD.raw.data (= suffixD.raw.data.fromBase + suffixD.raw.data.fromNomD, has comment line #)

    3: Add tags to suffixD meta file (meta)
    => generate ./data/suffixD.meta.data (commnet lines # are removed from raw)

    • Make sure there is no duplicated tag in ./dataOrg/suffixD.tag.txt
    • Program automatically tags nomD.S as valid suffixD pairs
    • Duplicated dPairs are OK (from nomD)
    • Correct all conflict dPairs (from nomD)
      => verify with linguists

    3.1: Verify suffixD meta file (meta)
    => Check consistency on derivational tag between 2 records with SpVars
    => generate ./data/suffixD.meta.data.conflict

    • All conflict EUI pairs need to be manually reviewed and then update the tag in ./dataOrg/suffixD.tag.txt

    4: Split suffixD meta file (yes|no|tbd)
    => generates

    • ./data/suffixD.yes.data
    • ./data/suffixD.no.data
    • ./data/suffixD.tbd.data (should be 0 if annual updates is completed)
      => send to linguist to tag this annual updates, then add updates to ./dataOrg/suffixD.tag.txt.${YEAR}

    • Duplicated SD pairs are normal because they are generated from parent-child candidate SD-rules.

    5: Verify dType on suffixD.yes.data
    => generates:

    • ./data/suffixD.yes.data.type
      • ./data/suffixD.yes.data.type.Z (must be 0)
      • ./data/suffixD.yes.data.type.P (must be 0)
      • ./data/suffixD.yes.data.type.S (= suffixD.yes.data)
      • ./data/suffixD.yes.data.type.ZS (must be 0)
      • ./data/suffixD.yes.data.type.PS (must be 0)
      • ./data/suffixD.yes.data.type.SS (should be 0)
      • ./data/suffixD.yes.data.type.U (must be 0)

    6: Add negation tag (N|O), sort -u: for annualy suffixD
    generate ./data/suffixD.yes.data.${YEAR}

    10
    => generate ./data/

    7: Get stats for sd-Rule from suffixD.tag.txt use this option to generate all suffixD pair for a specified suffix (check the suffixD.rawNo.rpt)

  • send data/suffixD.tbt.data to linguists for tagging:
    • derivation: yes|no
  • re-run this process until all suffixD are tagged (0 in suffixD.tbd.data)
  • The final suffixD is in ${DERIVATION}/suffixD/data/${YEAR}/data/suffixD.yes.data.${YEAR}

Please refer to derivation design documents in Lexical Tools for deatils.