Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Derivations Procedures - prefixD

Generate prefixD pairs in derivation table:

I. Directory: ${DERIVATION}/2.prefixD

II. Input Files (./data/${YEAR}/dataOrg/):
shell> ${PREFIX_D}/bin/GetPrefixD ${YEAR}
0

  • inflVars.data:
    => link inflVars.data to ./inflVars.data.${YEAR} (from ${LEXICON_DIR})
  • LEXICON:
    => link LEXICON to ./LEXICON.${YEAR} (from ${LEXICON_DIR}/LEXICON.release), no need!
  • link prefixD.tag.txt to ./prefixD.tag.txt.${YEAR} (from ../../${PREV_YEAR}/dataOrg/prefixD.tag.txt.${YEAR})
  • link prefixList.data to ./dataOrg/prefixList.data.${YEAR} (copy/update from previous year)

  • link prefixD.meta.data.conflict.tag.data to ./dataOrg/prefixD.meta.data.conflict.tag.data.${YEAR}
  • Just touch/create ./dataOrg/prefixD.meta.data.conflict.tag.data.${YEAR} at the init phase

III. Final files for allD (release)

  • ${TAR_DIR}/prefixD.yes.data.${YEAR}

IV. Summary of GetPrefixD

StepDescription and ProgramInputOutputNotesStep
0
  • Prepare directories and files
See section II.See section II.
  • 2.prefixD/data/${YEAR}/dataOrg
    • inflVars.data
    • prefixD.tag.txt
      => Make sure the prefixD.tag.txt.${YEAR} exist from the previous dataOrg
    • prefixList.data (update)
    • prefixD.meta.data.conflict.tag.data
0
1
  • Get valid prefix base forms from LEXICON
  • GetBaseForms.java
  • ${SRC_DIR}:
    • inflVars.data
  • bases.data
 1
2
  • Retrieve all raw prefixD pairs
  • GetPrefixFromBaseFile.java
  • ${SRC_DIR}:
    • prefixList.data

  • ${TAR_DIR}:
    • bases.data
  • prefixD.raw.data.all
  • prefixD.rawNo.rpt.all
  • This step retrieves all prefixD (including DONE-${YEAR} and TBD).
  • Use results from step 8 (next step) for release, new prefix
2
8
  • Retrieve raw prefixD pairs for this release
  • GetPrefixFromBaseFile.java
  • 8
    DONE
  • ${SRC_DIR}:
    • prefixList.data

  • ${TAR_DIR}:
    • bases.data
  • prefixD.raw.data.DONE
  • prefixD.rawNo.rpt.DONE
  • This step provides option of prefixes to retrieve:
    • TBD: all prefixD that are marked as TBD
    • DONE: all prefixD exclude TBD (used for release)
      => The result is linked to prefixD.raw.data and used for release
    • prefix: all prefixD for the specified prefixes
8
3
  • Add tags to prefixD meta file (meta)
  • GetPrefixMetaFile.java
  • DPairTagList
  • ${SRC_DIR}:
    • prefixD.tag.txt

  • ${TAR_DIR}:
    • prefixD.raw.data
      Link to ./prefixD.raw.data.DONE (from Step 8)
  • prefixD.meta.data
  • prefixD.meta.data.conflict
    The conflict file should be empty
  • The conflict file (prefixD.meta.data.conflict) lists all inconsistnent prefixD tags between SpVars in two records
  • Ideally, all prefixD should be consistent among SpVars between records
  • In the inital 1st run (before add tags to annually updates), no conflict should exist.
  • If (prefixD.meta.data.conflict) is not empty, send conflicts to linguist to tag [yes|no|both] on EUI lines
    • [yes]: all prefixD tags among SpVars between records are valid
    • [no]: all prefixD tags among SpVars between records are invalid
    • [both]: prefixD tags among SpVars between records inlcude valid and invalid (exception)
  • Update the tag result to ./dataOrg/prefixD.meta.data.conflict.tag.data
  • In the past, no both cases (exceptions) in prefixD
  • If spVar conflicts exist, run the next steps (14) to update the results to prefixD.tag.txt automatically, then re-run this step: 3, Otherwise, go to step-4.
3
14
  • Auto-fix prefixD.tag.txt for conflicts of dPair tags by SpVars
  • FixConflictDPairTags.java
    =>Used to fix inconsistency tag between spVars automatically
  • ${SRC_DIR}:
    • prefixD.tag.txt.${YEAR}
    • prefixD.meta.data.conflict.tag.data
  • ${SRC_DIR}:
    • prefixD.tag.txt.${YEAR}.fixDPair
  • Make sure update linguist tagging result to ./dataOrg/prefixD.meta.data.conflict.tag.data before running this step
  • Manully exam ./dataOrg/prefixD.tag.txt.${YEAR}.fixDPair
    => should not have any more conflict tags between spVars
  • Update the prefixD.tag.txt.${YEAR} with the lastest tag from Step 14
  • If prefixD.tag.txt.${YEAR}.fixDPair passes exam, move it to prefixD.tag.txt.${NEXT_YEAR}
    => then, re-run Step 3.
14
4
  • Split tags on prefixD meta file [yes|no|tbd|tbt]
  • SplitPrefixDMetaFile.java
  • ${SRC_DIR}:
    • prefixList.data

  • ${TAR_DIR}:
    • prefixD.meta.data
  • prefixD.yes.data
  • prefixD.no.data
  • prefixD.tbd.data
  • prefixD.tbt.data (= prefixD.tbd.data + prefix in the 1st field)
  • prefixD.yesNo.data
  • Make sure prefixD.tbt.data is empty. If not, sent to linguists to tag:
    • Tag prefixD: [yes|no]
      • valid prefixD: [yes]

        Tag negation: [O|N] if prefix is: a-, an-, de-, dys-, in-, under-

        • Negative: [N]
        • Otherwise: [O]
      • invalid prefixD: [no]
  • After tagged, manually update/append the tagged results to ./dataOrg/prefixD.tag.txt
    • Copy prefixD.tbt.data.tagged.txt to ./dataOrg/Tags/
    • Append the tagged results to the end of ./dataOrg/prefixD.tag.txt.${YEAR}
    • ln -sf ./prefixD.tag.txt.${YEAR} prefixD.tag.txt
  • Go to Step 4a
4
4a
  • Clean up tags on tagged file
  • CleanUpPDPairTagList.java
  • ${SRC_DIR}:
    • prefixD.tag.txt
  • ${SRC_DIR}:
    • prefixD.tag.txt.cleanUp
Re-run this step until:
Go to the end of the log file, check:
  • duplicate = 0

    If not, replace prefixD.tag.txt with prefixD.tag.txt.cleanUp
    (The cleanUp file remove duplicates).

  • conflict = 0

    If not, send conflict (from log.4a) to linguists to re-tag, then fix in the prefixD.tag.txt.
    Do NOT replace prefixD.tag.txt with prefixD.tag.txt.cleanUp until conflict = 0

  • diff = 0

    If not, replace prefixD.tag.txt with prefixD.tag.txt.cleanUp

Then, rerun Steps: 3~4 until the above three nunbers are 0 and prefixD.tbt.data is empty in Step 4.

4a
5
  • Verify dType on prefixD.yes.data
  • DType.java
  • ${ALL_SRC_DIR}:
    • LRSPL
    • dTypeStr.data

  • ${TAR_DIR}:
    • prefixD.yes.data
  • prefixD.yes.data.type
  • prefixD.yes.data.type.Z
  • prefixD.yes.data.type.S
  • prefixD.yes.data.type.P
  • prefixD.yes.data.type.ZS
  • prefixD.yes.data.type.SS
  • prefixD.yes.data.type.PS
  • prefixD.yes.data.type.U
Make sure unknonw dType (|U|) from prefixD is empty. 5
6
  • Add negation tag (N|O), sort uniquely
  • AddNegationTagToFile.java
  • DPairTagList.java
  • ${SRC_DIR}:
    • prefixList.data
    • prefixD.tag.txt

  • ${TAR_DIR}:
    • prefixD.yes.data
  • prefixD.yes.data.${YEAR}
  • prefixD.yes.data.${YEAR}.conflict
  • Check if there are any missing tags from the output log.
    • ** NegTagErr (43305): a|avascularize|verb|E0566090|vascularize|verb|E0064067|yes ...
    • -- Error negTag no (must be 0): x
  • The conflict file (./data/prefixD.yes.data.${YEAR}.conflict) lists all inconsistnent negation tags between SpVars in two records
    • Send conflicts to linguist to tag (N|O) on EUI lines
    • No prefixD pair should be B (Both) in negation (even the prefix is class of B)
    • In the past, there are some (6) conflicts need to be corrected, as shown below:
      • anti- is O (not N) when it has spVar of ante-
        antebrachium|noun|E0072172|brachium|noun|E0013901|O|
        antibrachium|noun|E0072172|brachium|noun|E0013901|N|

        antebrachial|adj|E0203565|brachial|adj|E0013883|O|
        antibrachial|adj|E0203565|brachial|adj|E0013883|N|

      • im- is O (not N) when it has spVar of em-
        empanel|verb|E0024983|panel|noun|E0045258|O|
        impanel|verb|E0024983|panel|noun|E0045258|N|
      • im- is O (not N) when it has spVar of em-
        embower|verb|E0580659|bower|verb|E0790464|O|
        imbower|verb|E0580659|bower|verb|E0790464|N|

        embower|verb|E0580659|bower|noun|E0434097|O|
        imbower|verb|E0580659|bower|noun|E0434097|N|

      • dis- is O (not N) when it has spVar of di-
        disyllable|noun|E0523982|syllable|noun|E0059482|O|
        dissyllable|noun|E0523982|syllable|noun|E0059482|N

      => These data (6 cases in 2025) re-occur yearly because they are class of O or N with SpVars, the negation are assigned by computer (not manually tag).
    • Use step 15 to auto-fix negation conflicts
      Copy ./data/prefixD.yes.data.${YEAR}.conflict to ./dataOrg/prefixD.yes.data.${YEAR}.conflict.tag.data

      Add tag (O) to each conflicted prefixD pairs
      This file is used in the next step (15).
      This step is to update negation tags of ./dataOrg/prefixD.yes.data.${YEAR}.conflict.tag.data from previous year (for fixing the re-occurring). Please note that the line no (1st field) might be different!
    • Also, if new negation conflicts found, update linguist's tags on new negation conflicts from ./data/prefixD.yes.data.${YEAR}.conflict to ./dataOrg/prefixD.yes.data.${YEAR}.conflict.tag.data (if any) before run Step 15
    • Update prefixD tag file:
      cp ./dataOrg/prefixD.tag.txt.${YEAR}.fixDPair ./dataOrg/prefixD.tag.txt.${YEAR}
    • Use Steps: 15-16 to update the results to prefixD.yes.data.${YEAR}
6
15
  • Auto-fix prefixD.tag.txt for conflicts of negation tags by SpVars for class of B
    • fix negation conflict for B class
    • list possible negation conflict for O|N classes
  • FixConflictNegationTags.java
  • ${SRC_DIR}:
    • prefixD.tag.txt.${YEAR}
    • prefixList.data
    • prefixD.yes.data.${YEAR}.conflict.tag.data
  • ${SRC_DIR}:
    • prefixD.tag.txt.${YEAR}.fixNegation
  • ${TAR_DIR}:
    • prefixD.negation.fix.data

  • Manully exam ./dataOrg/prefixD.tag.txt.${YEAR}.fixNegation
    => check above conflicts in Step-6 do not have negation tags (O|N), because it will be tagged in the next step (16)
  • If it is OK, move this file to ./dataOrg/prefixD.tag.txt.${YEAR}
15
16
  • Auto-fix prefixD.yes.data.${YEAR} for conflicts of negation tags by SpVars for classes of N and O from step-15
  • FixConflictNegationForClassNandO.java
  • ${TAR_DIR}:
    • prefixD.yes.data.${YEAR}
    • prefixD.negation.fix.data
  • prefixD.yes.data.${YEAR}.fixNegation
    => move to prefixD.yes.data.${YEAR}
  • Manully exam ./data/prefixD.yes.data.${YEAR}.fixNegation (for the negation changes by computer: (N|O)
    => Check if it fixes the negation of class O|N of above (6) conflict negation cases
    => diff prefixD.yes.data.${YEAR}.fixNegation prefixD.yes.data.${YEAR}
  • If it is OK,
    • mv ./data/prefixD.yes.data.${YEAR} ./data/prefixD.yes.data.${YEAR}.beforeFixNegation
    • cp -p ./data/prefixD.yes.data.${YEAR}.fixNegation ./data/prefixD.yes.data.${YEAR}
    • cp -p ./dataOrg/prefixD.tag.txt.${YEAR}.fixNegation ./dataOrg/prefixD.tag.txt.${NEXT_YEAR}
16
7
  • Check afflix on prefixD.yes.data.${YEAR}
  • CheckDerivationByAffix6.java
  • ${ALL_SRC_DIR}:
    • LRSPL

  • ${SRC_DIR}:
    • prefixD.tagYes.txt
      => copy form the previous year (empty file)

  • ${TAR_DIR}:
    • prefixD.yes.data.${YEAR}
  • prefixD.pattern3.rpt
  • Make sure prefixD.pattern3.rpt is empty. If not, send to linguists to tag (Yes|No):
    • invalid dPair (No): add to prefixD.tagNo.txt, then rerun Steps: 3~7
    • valid dPair (Yes): add to prefixD.tagYes.txt, then rerun Step: 7
7
11
  • Steps 1 ~ 7
See aboveSee aboveNot recomended! 11

V. Processes Details:

  • shell>cd ${DERIVATION}/prefixD/bin
  • shell>GetPrefixD ${YEAR}

    1. Routine process (no new PD-Rules, no new Tag)

    1: Get valid prefix base forms from LEXICON
    => generates ./data/bases.data

    2: Retrieve raw prefixD pairs
    or use
    8: Retrieve possible raw prefixD pairs with options
    DONE for all prefix is done tagged
    => generates:

    • ./data/prefixD.raw.data
    • ./data/prefixD.rawNo.rpt

    3: Add tags to prefixD meta file
    => generates ./data/prefixD.meta.data
    must be tagged of [yes|no], all errors must be fixed
    use tag of tbd to bypass entry with tagging errors

    3.1: Check conflicts by SpVars (different dPair tags between 2 records).
    => generates ./data/prefixD.meta.data.conflict
    Send to linguist to double check "[yes|no|both]"
    => Ideally, the tag of prefixD between two records should be the same
    => This file lists all inconsistent prefixD tags between two records (caused by SpVars).
    => If not empty, sent to linguist to tag [yes|no|both] the EUI line.

    • yes: all tags between these two records should be yes
    • no: all tags between these two records should be no
    • both: tags between these two records could be yes or no. In the past, no tag of both for prefixD

    => manually update this result to prefixD.tag.txt and rerun step 3 ~ 6.

    14: Auto-fix prefixD.tag.txt for conflicts by SpVars
    => Put the revised tagged file to: ./dataOrg/prefixD.meta.data.conflict.tag.data
    => copy ./dataOrg/prefixD.tag.txt.${YEAR}.fix to ./dataOrg/prefixD.tag.txt.${YEAR} and rerun this step.

    4: Split prefixD meta file
    => generates

    • ./data/prefixD.yesNo.data
    • ./data/prefixD.yes.data
    • ./data/prefixD.no.data
    • ./data/prefixD.tbd.data (tbt + tbd prefixes)
    • ./data/prefixD.tbt.data (to be tagged => annual update dPairs)

    Make sure prefixD.tbt.data is empty. If not, sent to linguists to tag:

    • Tag prefixD: [yes|no]
      • valid prefixD: yes

        Tag negation: (O|N) if prefix is: a-, an-, de-, dys-, in-, under-

        • Negative: N
        • Otherwise: O
      • invalid prefixD: no

    5: Verify dType on prefixD.yes.data
    => generates ./data/prefixD.yes.data.type

    • ./data/prefixD.yes.data.type.Z (must be 0)
    • ./data/prefixD.yes.data.type.P (should = ./data/prefixD.yes.data)
    • ./data/prefixD.yes.data.type.Z (must be 0)
    • ./data/prefixD.yes.data.type.ZS (must be 0)
    • ./data/prefixD.yes.data.type.PS (should = 0)
    • ./data/prefixD.yes.data.type.SS (must be 0)
    • ./data/prefixD.yes.data.type.U (must be 0)

    6: Add negation tag (N|O), it is uniquely sorted in the program (not by sort -u)
    => generates ./data/prefixD.yes.data.2014
    Negation tagging error must be fixed
    => send to linguist to tag the negation (N|O)

    6.1: Check conflict (inconsistent) tags between SpVars
    generates ./data/prefixD.yes.data.${YEAR}.conflict


    => Ideally, the tag of prefixD between two records should be the same
    Also, might cause inconsistent Negation tag on prefixD.
    => Ideally, the tag of negation between two records should be the same
    => If not empty, sent to linguist to tag (N|O|B) the EUI line.
    => The negation could have exceptions:

    • anti- is O (not N) when it has spVar of ante-
    • im- is O (not N) when it has spVar of em-
    • dis- is O (not N) when it has spVar of di-


    => manually update this result to prefixD.yes.data.${YEAR}
    => The final prefix is in ${DERIVATION}/prefixD/data/${YEAR}/data/prefixD.yes.data.${YEAR}

    15: Auto-fix prefixD.tag.txt for negation conflicts by SpVars
    => Put the revised tagged file to: ./dataOrg/prefixD.yes.data.${YEAR}.conflict.tag.data
    Known cases in 2015 are:

    1|E0013901|E0072172|
    # 556|antebrachium|noun|E0072172|brachium|noun|E0013901|O|
    # 1431|antibrachium|noun|E0072172|brachium|noun|E0013901|N|
    2|E0013883|E0203565|
    # 557|antebrachial|adj|E0203565|brachial|adj|E0013883|O|
    # 1432|antibrachial|adj|E0203565|brachial|adj|E0013883|N|
    3|E0024983|E0045258|
    # 11245|empanel|verb|E0024983|panel|noun|E0045258|O|
    # 15077|impanel|verb|E0024983|panel|noun|E0045258|N|
    4|E0434097|E0580659|
    # 11243|embower|verb|E0580659|bower|noun|E0434097|O|
    # 15072|imbower|verb|E0580659|bower|noun|E0434097|N|
    5|E0059482|E0523982|
    # 9310|disyllable|noun|E0523982|syllable|noun|E0059482|O|
    # 10500|dissyllable|noun|E0523982|syllable|noun|E0059482|N|
    	

    => copy ./dataOrg/prefixD.tag.txt.${YEAR}.fixNegation to ./dataOrg/prefixD.tag.txt.${YEAR} and rerun this step.
    => Check log, run step-16 if possible negation fix exist

    16: Auto-fix prefixD.tag.txt for negation conflicts by SpVars for class N and O
    => Check fix file exist: ./data/prefixD.negation.fix.data
    => copy ./data/prefixD.yes.${YEAR}.fixNegation to ./data/prefixD.yes.${YEAR}

    7: Check afflix on prefixD.yes.data.${YEAR}
    => generates ./data/prefixD.pattern3.rpt (should be empty)

    11: Run above 1-7 steps (default)
    => above steps from 1 ~ 7

    2. Add new PD-Rules process
    8: Retrieve possible raw prefixD pairs with options

    • use this option ${PREFIX} to generate all prefixD pairs for a specified prefix (check the prefixD.rawNo.rpt.${PREFIX})
    • send to linguists for tagging (see below)
    • add new tagged dPairs results to ./dataOrg/prefixD.tag.txt
    • update ./dataOrg/prefixList.data (so the prefix will be added tag)
    • use DONE to retrieved all prefix are not TBD

    Same procedures as above (regular)
    3: Add tags to prefixD meta file
    4: Split prefixD meta file
    5: Verify dType on prefixD.yes.data
    6: Add negation tag (N|O)
    7: Compare original tag and result tag files

    3. Add tag for new prefix dPairs (annual updates)

    • send ./data/prefixD.tbt.data to linguists for tagging:
      • derivation: [yes|no]
      • negation: O|N (if tagged yes and the prefix is: a-, an-, de-, dys-, in-, under-)
    • Append new tagging results to ./dataOrg/prefixD.tag.txt
    • re-run this process until all prefixD are tagged (0 in prefixD.tbt.data)
  • Update numbers of prefixD in the Derivation table growth

Update prefixD growth

Please refer to derivation design documents in Lexical Tools for details.