Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Derivations Procedures - nomD

This step includes:

  • Generate nomD (derivation from nominalization in Lexicon)
  • Automatically tag zeroD and suffixD to add to derivation table

I. Directory:

  • ${DERIVATION}/1.nomD

II. Input Files (./data/${YEAR}/dataOrg/):
shell> ${NOM_D}/bin/GetNomD ${YEAR}
0

  • The following procedures are automatically done in the Step-0
    • shell> cd ${NOM_D}/data
    • shell> mkdir -p ${YEAR}/dataOrg

    • LRNOM:
      => link LRNOM to LRNOM.${YEAR} from new release (${LEXICON}/data/tables)
    • prepositions.data:
      Get the latest preposition if LEXICON.release.${YEAR} is ready
      shell>cd ${LC-proc}/bin/GetFilesFromLexicon
      2
      3
      12
      13
      => copy prepositions.data.${YEAR} generated from ${LEX_CHECK}/data/Files/prepositions.data.${YEAR}
      => This file should include all prepositions from previous year plus new prepositions
    • nomD.tagNo.txt
      => copy nomD.tagNo.txt.${PREV_YEAR} to nomD.tagNo.txt.${YEAR}
      => might need to be updated in step-3
    • nomD.tagYes.txt
      => copy nomD.tagYes.txt.${PREV_YEAR} to nomD.tagYes.txt.${YEAR}
      => might need to be updated in step-5

    • LRSPL:
      => link LRSPL to ./5.allD/data/${YEAR}/dataOrg/LRSPL.${YEAR} from new release (${LEXICON}/data/tables)
    • dTypeStr.data:
      => copy dTypeStr.data.${PREV_YEAR} to dTypeStr.data.${YEAR}

III. Result files used in allD

  • ${TAR_DIR}/nomD.yes.Z.data.${YEAR}
  • ${TAR_DIR}/nomD.yes.S.data.${YEAR}

IV. Summary of GetNomD

StepDescription and ProgramInputOutputNotes
0
  • Prepare directories and files
See section II.See section II.
  • 1.nomD/data/${YEAR}/dataOrg
    • LRNOM
    • prepositions.data (see description above for update)
    • nomD.tagNo.txt
    • nomD.tagYes.txt
  • 5.allD/data/{$YEAR}/dataOrg
    • LRSPL
    • dTypeStr.data
1
  • Retrieve raw nomD pairs
  • GetNomDRawFromNomFile.java
  • ${SRC_DIR}:
    • LRNOM
  • nomD.raw.data
 
2
  • Add tag (yes|no) to nomD
  • GetNomDMetaFile.java
  • ${SRC_DIR}:
    • nomD.tagNo.txt
    • prepositions.data

  • ${TAR_DIR}:
    • nomD.raw.data
  • nomD.meta.data
  • nomD.yes.data
  • nomD.no.data
  • Might need to rerun if Step-3 find some invalid dPairs from nomD
3
  • Add dType (P|Z|S|PS|ZS|SS|U)
  • DType.java
  • ${ALL_SRC_DIR}:
    • LRSPL
    • dTypeStr.data

  • ${TAR_DIR}:
    • nomD.yes.data
  • nomD.yes.data.type
  • nomD.yes.data.type.Z
  • nomD.yes.data.type.S
  • nomD.yes.data.type.P
  • nomD.yes.data.type.ZS
  • nomD.yes.data.type.SS
  • nomD.yes.data.type.PS
  • nomD.yes.data.type.U
  • nomD.yes.data.type.ZandS
  • Follow the message from the program
  • Make sure nomD.yes.data.type.U is empty. If not, sent to linguist to tag [S|Z|No]:
    • invalid dPair (No): add to 1.nomD/data/${YEAR}/dataOrg/nomD.tagNo.txt, then rerun Steps: 2~3
    • valid dPair [S|Z]: add to 5.allD/data/${YEAR}/dataOrg/dTypeStr.data.${YEAR}, then rerun Step-3.
  • Make sure the word count of valid nomD are the same (in the message)
4
  • Add negation tag: [O|N], sort
  • AddNegationTagToFile.java
  • ${TAR_DIR}:
    • nomD.yes.data.type.ZandS
    • nomD.yes.data.type.Z
    • nomD.yes.data.type.S
  • nomD.yes.data.${YEAR}
  • nomD.yes.Z.data.${YEAR}
  • nomD.yes.S.data.${YEAR}
  • Total number of S and Z should = ZandS
5
  • Check afflix on nomD.yes.data.${YEAR}
  • CheckDerivationByAffix6.java
  • ${ALL_SRC_DIR}:
    • LRSPL

  • ${SRC_DIR}:
    • nomD.tagYes.txt

  • ${TAR_DIR}:
    • nomD.yes.data.${YEAR}
  • nomD.pattern3.rpt
  • Make sure nomD.pattern3.rpt is empty. If not, send to linguist to tag (Yes|No):
    • invalid dPair (No): add to nomD.tagNo.txt, then rerun Steps: 2~5
    • valid dPair (Yes): add to nomD.tagYes.txt, then rerun Steps: 2~5
6
  • Steps 1 ~ 5
See aboveSee aboveNot recomended!

V. Processes details:
Save mesage to log.${STEP} in ./Logs/${YEAR}/

  • shell>cd ${DERIVATION}/1.nomD/bin
  • shell>GetNomD ${YEAR}

    0: Prepare directories and fiels
    => generates: 1.nomD/data/dataOrg/*
    => generates: 5.allD/data/dataOrg/*

    1: Retrieve std-raw nomD pairs
    => generates: ./data/nomD.raw.data

    2: Add tag (yes|no): to nomD: meta, yes, no files
    => requires:

    • ../dataOrg/preposition.data
      Use in program to identify invalid dPairs from nomD
      • xxxparticle|noun|eui1|xxx|verb|eui2
        lookup|noun|E0222422|look|verb|E003804
      • xxx-particle|noun|eui1|xxx|verb|eui2
        grown-up|noun|E0030484|grow|verb|E0030480
    • ../dataOrg/nomD.tagNo.txt
      Use to tag invalid dPair from nomD, which can't identify by above algorithm

    => generates:
    • ./data/nomD.meta.data
    • ./data/nomD.yes.data
    • ./data/nomD.no.data

    3: Add dType (P|Z|S|PS|ZS|SS|U): Split nomD.yes to (Z) and (S)
    => generates:

    • ./data/nomD.yes.data.type

    • ./data/nomD.yes.data.type.Z
    • ./data/nomD.yes.data.type.S
    • ./data/nomD.yes.data.type.P
      => must be empty
    • ./data/nomD.yes.data.type.ZS (Z by SpVars)
    • ./data/nomD.yes.data.type.SS (S by SpVars)
    • ./data/nomD.yes.data.type.PS (P by SpVars)
      => must be empty
    • ./data/nomD.yes.data.type.U (Unknown)
      => must be empty
      => if not empty, send to linguist to tag (S|Z|No):
      • if invalid (No), add to 1.nomD/../nomD.tagNo.txt
      • if valid (S|Z), add to 5.allD/../dTypeStr.data

    4: Add negation tag: (O|N), sort uniquely
    => generates:

    • ./data/nomD.yes.data.${YEAR} (only inlcude S and Z)
    • ./data/nomD.yes.data.S.${YEAR}
    • ./data/nomD.yes.data.Z.${YEAR}

    5: Check afflix on nomD.yes.data.${YEAR} ..
    => generates:

    • ./data/nomD.pattern3.rpt
      Should be empty, the number of possible invalid nomD: 0 (should be 0)
      => if not empty
      • if invalid: add to ${1.nomD}/data/${YEAR}/dataOrg/nomD.tagNo.txt, and repeat steps 2 ~5
      • if valid: add to ${1.nomD}/data/2015/dataOrg/nomD.tagYes.txt
    or

    5: Run above 1-4 steps

  • Compare the nomD.no.data to previous year and validate the difference
  • The final nomD (belows) are used in zeroZ and suffixD for auto-tag:
    • ${1.nomD}/data/${YEAR}/data/nomD.yes.Z.data.${YEAR}
    • ${1.nomD}/data/${YEAR}/data/nomD.yes.S.data.${YEAR}

Please refer to derivation design documents in Lexical Tools for details.