The SPECIALIST Lexicon

Derivations Procedures - zeroD

Generate zeroD pairs (facts) in derivation table:

I. Directory:

  • ${DERIVATION}/4.zeroD

II. Input Files (./data/${YEAR}/dataOrg/):
shell> ${ZERO_D}/bin/GetZeroD ${YEAR}
0

  • LEXICON:
    => link from ${LEXICON_DIR}/LEXICON.release
    => link LEXICON to LEXICON.${YEAR}
  • zeroD.tag.txt:
    => copy from zeroD.tag.txt.${PREV_YEAR}
    => link zeroD.tag.txt to zeroD.tag.txt.${YEAR}
    => need to be updated in step 4
  • zeroD.tagYes.txt:
    => copy from zeroD.tagYes.txt.${PREV_YEAR}
    => link zeroD.tagYes.txt to zeroD.tagYes.txt.${YEAR}
    => might to be updated in step 8, however, it should be empty ater 2014

  • Must complete nomD first for auto-tag program to work!

III. Final files for allD (release)

  • ${TAR_DIR}/zeroD.yes.data.${YEAR}

IV. Summary of GetZeroD

StepDescription and ProgramInputOutputNotes
0
  • Prepare directories and files
See section II.See section II.
  • 4.zeroD/data/${YEAR}/dataOrg
    • LEXICON
    • zeroD.tag.txt
    • zeroD.tagYes.txt
1
  • Get valid base from LEXICON
  • GetBasesFromLexicon.java
  • ${SRC_DIR}:
    • LEXICON
  • bases.data
Get bases (citatin and sspVars) from Lexicon except for:
  • abbreviation
  • acronym
  • min. length is 2
2
  • Retrieve std-raw zeroD pairs
  • GetZeroDRawFromBaseFile.java
  • ${TAR_DIR}:
    • bases.data
  • zeroD.raw.data.fromBase
  • The raw zeroD pair should include both upper and lower cases
3
  • Combine with nomD.Z file (raw)
  • CheckWithNomDFile.java
  • ${NOM_TAR_DIR}:
    • nomD.yes.Z.data.${YEAR}

  • ${TAR_DIR}:
    • zeroD.raw.data.fromBase
  • zeroD.raw.data.fromNomD
  • zeroD.raw.data
  • Log shows all zeroDs from nomD that are not included in zeroD from base
4
  • Add tags to zeroD raw file
  • GetZeroDMetaFile.java
  • DPairTagList.java
  • ${NOM_TAR_DIR}:
    • nomD.yes.Z.data.${YEAR}

  • ${SRC_DIR}:
    • zeroD.tag.txt

  • ${TAR_DIR}:
    • zeroD.raw.data
  • zeroD.meta.data
  • zeroD.meta.data.conflict
The following must be 0:
Check on "(must)" in log.4
  • -- Total invalid tag no (must = 0): 0
  • -- Empty line no (must = 0): 0
  • -- Invalid tag no (must = 0): 0
    =>Sent to Linguist in Step-5
  • - conflict (yes|no) tag no (must = 0): 0
    => Check if tag on zeroD is consistent between two records (by SpVars)
    => if not 0, send the file (zeroD.meta.data.conflict) to linguists to verify
    => Use the tagged file to manual update the ${SRC_DIR}/zeroD.tag.txt
  • - none (tbd) tag no (must be 0): 0
5
  • Split tags on zeroD meta file [yes|no|tbd]
  • SplitZeroDMetaFile.java
  • ${TAR_DIR}:
    • zeroD.meta.data
  • zeroD.yes.data
  • zeroD.no.data
  • zeroD.tbd.data
If the zeroD.tbd.data is not empty:
  • Send tbd dPair (zeroD.tbd.data) to linguists to tag [yes|no].
  • Put tagged file to ${DERIVATION}/data/${YEAR}/dataOrg/Tags/zeroD.tbd.data.tagged.txt
  • Append tagged file to ./dataOrg/zeroD.tag.txt
  • Then, run step 5a, then rerun Steps: 4~5 until it is empty
5a
  • Clean up tags on tagged file
  • CleanUpDPairTagList.java
  • ${SRC_DIR}:
    • zeroD.tag.txt
  • ${SRC_DIR}:
    • zeroD.tag.txt.cleanUp
Re-run this step until:
  • conflict = 0
    If not, send conflict (from log.5a) to linguists to re-tag.
    Do NOT replace zeroD.tbd.data with zeroD.tbd.data.cleanUp until conflict = 0
  • duplicate = 0
    If not, replace zeroD.tbd.data with zeroD.tbd.data.cleanUp
  • diff = 0
    If not, replace zeroD.tbd.data with zeroD.tbd.data.cleanUp
  • Then, rerun Steps: 4~5 until conflict no, tbd no, invalid tag no, etc. are 0
6
  • Verify dType on valid zeroD
  • DType.java
  • ${ALL_SRC_DIR}:
    • LRSPL
    • dTypeStr.data

  • ${TAR_DIR}:
    • zeroD.yes.data
  • zeroD.yes.data.type
  • zeroD.yes.data.type.Z
  • zeroD.yes.data.type.S
  • zeroD.yes.data.type.P
  • zeroD.yes.data.type.ZS
  • zeroD.yes.data.type.SS
  • zeroD.yes.data.type.PS
  • zeroD.yes.data.type.U
The following should be 0
  • prefixD (|P|)
  • suffixD (|S|)
  • zeroD by SpVars (|ZS|)
  • prefixD by SpVars (|PS|)
  • suffixD by SpVars (|SS|)
  • unknonw dType (|U|)
7
  • Automatically add negation tag [O] to all valid zeroD pairs, then sort -u
  • AddNegationTagToFile.java
  • DPairTagList.java
  • ${TAR_DIR}:
    • zeroD.yes.data
  • zeroD.yes.data.${YEAR}
  • zeroD.yes.data.${YEAR}.conflict
The following should be 0
  • -- Empty line no (must = 0): 0
  • -- Invalid tag no (must = 0): 0
  • - none (tbd) tag no (must be 0): 0

If the "conflict (N|O) tag no:" is not 0:

  • The conflict file (zeroD.yes.data.2018.conflict) lists all inconsistnent zeroD tags between SpVars in two records
  • Send conflicts to linguists to tag [yes|no|both] on EUI lines
  • In the past, no both cases in zeroD
  • Manually update the results to zeroD.tag.txt
  • Re-run Steps: 4~7

If the "conflict (N|O) tag no:" is 0:

  • program generates zeroD.yes.data.${YEAR}
8
  • Check afflix on zeroD
  • CheckDerivationByAffix6.java
  • ${ALL_DRC_DIR}:
    • LRSPL

  • ${SRC_DIR}:
    • zeroD.tagYes.txt

  • ${TAR_DIR}:
    • zeroD.yes.data.${YEAR}
  • zeroD.pattern3.rpt
  • The number of possible invalid zeroD: 0 (should be 0)
  • Make sure zeroD.pattern3.rpt is empty. If not, send to linguist to tag [Yes|No]:
    • invalid dPair [No]: add to zeroD.tagNo.txt, then rerun Steps: 3~6 (?? this file is never used in zeroD)
    • valid dPair [Yes]: add to zeroD.tagYes.txt, then rerun Steps: 3~6
  • Please notes that both above 2 files should be empty because there is no exception for afflix check on zeroD
9
  • Steps 1 ~ 8
See aboveSee aboveNot recomended!
10
  • Auto-fix zeroD.tag.txt for conflicts by SpVar
  • FixConflictDPairTags.java

  • ${SRC_DIR}:
    • zeroD.tag.txt.${YEAR}
    • zeroD.meta.data.conflict.tag.data
    • zeroD.tag.txt.${YEAR}.fixDPair
  • Not used after 2014+!

    V. Processes Details:
    Save mesage to log.${STEP} in ./Logs/${YEAR}/

    • shell>cd ${DERIVATION}/zeroD/bin
    • shell>GetZeroD ${YEAR} > log.${STEP}
    • shell>mv log.${STEP} ./${YEAR}/.

      0: Prepare directories and files
      => generates: 4.zeroD/data/dataOrg/*

      1: Get valid base from LEXICON (no abb/acr, min. size=2)
      => generates ./data/bases.data (no abb/acr, min. size=2)

      2: Retrieve std-raw zeroD pairs
      => generates ./data/zeroD.raw.data.fromBase

      3: Check/integrate with nomD.Z file (raw)
      => generates:

      • ./data/zeroD.raw.data.fromNomD (new ZD from nomD)
      • ./data/zeroD.raw.data (= zeroD.raw.data.fromBase + zeroD.raw.data.fromNomD)

      4: Add tags to zeroD meta file (meta)
      => generates ./data/zeroD.meta.data

      • Make sure there is no duplicated tag in ./dataOrg/zeroD.tag.txt
        shell>sort -u zeroD.tag.txt |wc -l
      • Program automatically tags nomD.Z as valid zeroD pairs
      • Duplicated dPairs are OK (from nomD)
      • Correct all conflict dPairs (from nomD)
        => verify with linguists

      5: Split tags on zeroD meta file (yes|no|tbd)
      => generates

      • ./data/zeroD.yes.data
      • ./data/zeroD.no.data
      • ./data/zeroD.tbd.data (should be 0 if annual updates is completed)
        => send to linguist to tag this annual updates, then add updates to ./dataOrg/zeroD.tag.txt.${YEAR}

      6: Verify dType on zeroD.yes.data
      => generates:

      • ./data/zeroD.yes.data.type
        • ./data/zeroD.yes.data.type.Z (= zeroD.yes.data)
        • ./data/zeroD.yes.data.type.P (must be 0)
        • ./data/zeroD.yes.data.type.S (must be 0)
        • ./data/zeroD.yes.data.type.ZS (should be 0)
        • ./data/zeroD.yes.data.type.PS (must be 0)
        • ./data/zeroD.yes.data.type.SS (must be 0)
        • ./data/zeroD.yes.data.type.U (must be 0)

      7: Add negation tag (O), sort -u: for annually zeroD
      generate ./data/zeroD.yes.data.${YEAR}

      7.1: Check conflict (inconsistent) tags between spVars
      generate ./data/zeroD.yes.data.${YEAR}.conflict
      => Ideally, the tag of zeroD between two records should be the same.
      => This file lists all inconsistent zeroD tags between two records (caused by SpVars).
      => If not empty, sent to linguist to tag (yes|no|both) the EUI line.

      • yes: all tags between these two records should be yes
      • no: all tags between these two records should be no
      • both: tags between these two records could be yes or no. In the past, no tag of both for zeroD and rerun Step 4~7

      manually update his result to zeroD.tag.txt

      8: Add negation tag (O), sort uniquely for annually zeroD
      Above step from 1 ~ 7

    • send zeroD.tbd.data to linguists for tagging (yes|no)
    • re-run this process until all zeroD are tagged (0 in zeroD.tbd.data)
    • The final zeroD is in ${DERIVATION}/zeroD/data/${YEAR}/data/zeroD.yes.data.${YEAR}
    • Work on zeroD in the Derivation table growth

    Please refer to derivation design documents in Lexical Tools for details.