Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Derivations Procedures - zeroD

Generate zeroD pairs (facts) in derivation table:

I. Directory:

  • ${DERIVATION}/4.zeroD

II. Input Files (./data/${YEAR}/dataOrg/):
shell> ${ZERO_D}/bin/GetZeroD ${YEAR}
0

  • LEXICON:
    => link from ${LEXICON_DIR}/LEXICON.release
    => link LEXICON to LEXICON.${YEAR}
  • zeroD.tag.txt:
    => copy from zeroD.tag.txt.${PREV_YEAR}
    => link zeroD.tag.txt to zeroD.tag.txt.${YEAR}
    => need to be updated in step 4
  • zeroD.tagYes.txt:
    => copy from zeroD.tagYes.txt.${PREV_YEAR}
    => link zeroD.tagYes.txt to zeroD.tagYes.txt.${YEAR}
    => might to be updated in step 8, however, it should be empty ater 2014

  • Must complete nomD first for auto-tag program to work!

III. Final files for allD (release)

  • ${TAR_DIR}/zeroD.yes.data.${YEAR}

IV. Summary of GetZeroD

StepDescription and ProgramInputOutputNotes
0
  • Prepare directories and files
See section II.See section II.
  • 4.zeroD/data/${YEAR}/dataOrg
    • LEXICON
    • zeroD.tag.txt
    • zeroD.tagYes.txt
1
  • Get valid base from LEXICON
  • GetBasesFromLexicon.java
  • ${SRC_DIR}:
    • LEXICON
  • bases.data
Get bases (citatin and sspVars) from Lexicon except for:
  • abbreviation
  • acronym
  • min. length is 2
2
  • Retrieve std-raw zeroD pairs
  • GetZeroDRawFromBaseFile.java
  • ${TAR_DIR}:
    • bases.data
  • zeroD.raw.data.fromBase
  • The raw zeroD pair should include both upper and lower cases
3
  • Combine with nomD.Z file (raw)
  • CheckWithNomDFile.java
  • ${NOM_TAR_DIR}:
    • nomD.yes.Z.data.${YEAR}

  • ${TAR_DIR}:
    • zeroD.raw.data.fromBase
  • zeroD.raw.data.fromNomD
  • zeroD.raw.data
  • Log shows all zeroDs from nomD that are not included in zeroD from base
4
  • Add tags to zeroD raw file
  • GetZeroDMetaFile.java
  • DPairTagList.java
  • ${NOM_TAR_DIR}:
    • nomD.yes.Z.data.${YEAR}

  • ${SRC_DIR}:
    • zeroD.tag.txt

  • ${TAR_DIR}:
    • zeroD.raw.data
  • zeroD.meta.data
  • zeroD.meta.data.conflict
The following must be 0:
Check on "(must)" in log.4
  • -- Total invalid tag no (must = 0): 0
  • -- Empty line no (must = 0): 0
  • -- Invalid tag no (must = 0): 0
    =>Sent to linguists in Step-5
  • - conflict (yes|no) tag no (must = 0): 0
    => Check if tag on zeroD is consistent between two records (by SpVars)
    => if not 0, send the file (zeroD.meta.data.conflict) to linguists to verify
    => Use the tagged file to manual update the ${SRC_DIR}/zeroD.tag.txt
  • - none (tbd) tag no (must be 0): 0
5
  • Split tags on zeroD meta file [yes|no|tbd]
  • SplitZeroDMetaFile.java
  • ${TAR_DIR}:
    • zeroD.meta.data
  • zeroD.yes.data
  • zeroD.no.data
  • zeroD.tbd.data
If the zeroD.tbd.data is not empty:
  • Send tbd dPair (zeroD.tbd.data) to linguists to tag [yes|no].
  • Put tagged file to ${DERIVATION}/data/${YEAR}/dataOrg/Tags/zeroD.tbd.data.tagged.txt
  • Append tagged file to ./dataOrg/zeroD.tag.txt
  • Then, run step 5a, then rerun Steps: 4~5 until it is empty
5a
  • Clean up tags on tagged file
  • CleanUpDPairTagList.java
  • ${SRC_DIR}:
    • zeroD.tag.txt
  • ${SRC_DIR}:
    • zeroD.tag.txt.cleanUp
Re-run this step until:
  • conflict = 0
    If not, send conflict (from log.5a) to linguists to re-tag.
    Do NOT replace zeroD.tag.txt with zeroD.tag.txt.cleanUp until conflict = 0
  • duplicate = 0
    If not, replace zeroD.tag.txt with zeroD.tag.txt.cleanUp
  • diff = 0
    If not, replace zeroD.tag.txt with zeroD.tag.txt.cleanUp
  • Then, rerun Steps: 4~5 until conflict no, tbd no, invalid tag no, etc. are 0
6
  • Verify dType on valid zeroD
  • DType.java
  • ${ALL_SRC_DIR}:
    • LRSPL
    • dTypeStr.data

  • ${TAR_DIR}:
    • zeroD.yes.data
  • zeroD.yes.data.type
  • zeroD.yes.data.type.Z
  • zeroD.yes.data.type.S
  • zeroD.yes.data.type.P
  • zeroD.yes.data.type.ZS
  • zeroD.yes.data.type.SS
  • zeroD.yes.data.type.PS
  • zeroD.yes.data.type.U
The following should be 0
  • prefixD (|P|)
  • suffixD (|S|)
  • zeroD by SpVars (|ZS|)
  • prefixD by SpVars (|PS|)
  • suffixD by SpVars (|SS|)
  • unknonw dType (|U|)
7
  • Automatically add negation tag [O] to all valid zeroD pairs, then sort -u
  • AddNegationTagToFile.java
  • DPairTagList.java
  • ${TAR_DIR}:
    • zeroD.yes.data
  • zeroD.yes.data.${YEAR}
  • zeroD.yes.data.${YEAR}.conflict
The following should be 0
  • -- Empty line no (must = 0): 0
  • -- Invalid tag no (must = 0): 0
  • - none (tbd) tag no (must be 0): 0

If the "conflict (N|O) tag no:" is not 0:

  • The conflict file (zeroD.yes.data.2018.conflict) lists all inconsistnent zeroD tags between SpVars in two records
  • Send conflicts to linguists to tag [yes|no|both] on EUI lines
  • In the past, no both cases in zeroD
  • Manually update the results to zeroD.tag.txt
  • Re-run Steps: 4~7

If the "conflict (N|O) tag no:" is 0:

  • program generates zeroD.yes.data.${YEAR}
8
  • Check afflix on zeroD
  • CheckDerivationByAffix6.java
  • ${ALL_DRC_DIR}:
    • LRSPL

  • ${SRC_DIR}:
    • zeroD.tagYes.txt

  • ${TAR_DIR}:
    • zeroD.yes.data.${YEAR}
  • zeroD.pattern3.rpt
  • The number of possible invalid zeroD: 0 (should be 0)
  • Make sure zeroD.pattern3.rpt is empty. If not, send to linguist to tag [Yes|No]:
    • invalid dPair [No]: add to zeroD.tagNo.txt, then rerun Steps: 3~6 (?? this file is never used in zeroD)
    • valid dPair [Yes]: add to zeroD.tagYes.txt, then rerun Steps: 3~6
  • Please notes that both above 2 files should be empty because there is no exception for afflix check on zeroD
9
  • Steps 1 ~ 8
See aboveSee aboveNot recomended!
10
  • Auto-fix zeroD.tag.txt for conflicts by SpVar
  • FixConflictDPairTags.java

  • ${SRC_DIR}:
    • zeroD.tag.txt.${YEAR}
    • zeroD.meta.data.conflict.tag.data
    • zeroD.tag.txt.${YEAR}.fixDPair
  • Not used after 2014+!

    V. Processes Details:
    Save mesage to log.${STEP} in ./Logs/${YEAR}/

    • shell>cd ${DERIVATION}/zeroD/bin
    • shell>GetZeroD ${YEAR} > log.${STEP}
    • shell>mv log.${STEP} ./${YEAR}/.

      0: Prepare directories and files
      => generates: 4.zeroD/data/dataOrg/*

      1: Get valid base from LEXICON (no abb/acr, min. size=2)
      => generates ./data/bases.data (no abb/acr, min. size=2)

      2: Retrieve std-raw zeroD pairs
      => generates ./data/zeroD.raw.data.fromBase

      3: Check/integrate with nomD.Z file (raw)
      => generates:

      • ./data/zeroD.raw.data.fromNomD (new ZD from nomD)
      • ./data/zeroD.raw.data (= zeroD.raw.data.fromBase + zeroD.raw.data.fromNomD)

      4: Add tags to zeroD meta file (meta)
      => generates ./data/zeroD.meta.data

      • Make sure there is no duplicated tag in ./dataOrg/zeroD.tag.txt
        shell>sort -u zeroD.tag.txt |wc -l
      • Program automatically tags nomD.Z as valid zeroD pairs
      • Duplicated dPairs are OK (from nomD)
      • Correct all conflict dPairs (from nomD)
        => verify with linguists

      5: Split tags on zeroD meta file (yes|no|tbd)
      => generates

      • ./data/zeroD.yes.data
      • ./data/zeroD.no.data
      • ./data/zeroD.tbd.data (should be 0 if annual updates is completed)
        => send to linguist to tag this annual updates, then add updates to ./dataOrg/zeroD.tag.txt.${YEAR}

      6: Verify dType on zeroD.yes.data
      => generates:

      • ./data/zeroD.yes.data.type
        • ./data/zeroD.yes.data.type.Z (= zeroD.yes.data)
        • ./data/zeroD.yes.data.type.P (must be 0)
        • ./data/zeroD.yes.data.type.S (must be 0)
        • ./data/zeroD.yes.data.type.ZS (should be 0)
        • ./data/zeroD.yes.data.type.PS (must be 0)
        • ./data/zeroD.yes.data.type.SS (must be 0)
        • ./data/zeroD.yes.data.type.U (must be 0)

      7: Add negation tag (O), sort -u: for annually zeroD
      generate ./data/zeroD.yes.data.${YEAR}

      7.1: Check conflict (inconsistent) tags between spVars
      generate ./data/zeroD.yes.data.${YEAR}.conflict
      => Ideally, the tag of zeroD between two records should be the same.
      => This file lists all inconsistent zeroD tags between two records (caused by SpVars).
      => If not empty, sent to linguist to tag (yes|no|both) the EUI line.

      • yes: all tags between these two records should be yes
      • no: all tags between these two records should be no
      • both: tags between these two records could be yes or no. In the past, no tag of both for zeroD and rerun Step 4~7

      manually update his result to zeroD.tag.txt

      8: Add negation tag (O), sort uniquely for annually zeroD
      Above step from 1 ~ 7

    • send zeroD.tbd.data to linguists for tagging (yes|no)
    • re-run this process until all zeroD are tagged (0 in zeroD.tbd.data)
    • The final zeroD is in ${DERIVATION}/zeroD/data/${YEAR}/data/zeroD.yes.data.${YEAR}
    • Work on zeroD in the Derivation table growth

    Please refer to derivation design documents in Lexical Tools for details.