Step | Description and Program | Input | Output | Notes
|
---|
0 | - Prepare directories and files
| See section II. | See section II. | - 4.zeroD/data/${YEAR}/dataOrg
- LEXICON
- zeroD.tag.txt
- zeroD.tagYes.txt
|
|
1 | - Get valid base from LEXICON
GetBasesFromLexicon.java
|
|
| Get bases (citatin and sspVars) from Lexicon except for:- abbreviation
- acronym
- min. length is 2
|
2 | - Retrieve std-raw zeroD pairs
GetZeroDRawFromBaseFile.java
|
|
| - The raw zeroD pair should include both upper and lower cases
|
3 | - Combine with nomD.Z file (raw)
CheckWithNomDFile.java
| - ${NOM_TAR_DIR}:
- ${TAR_DIR}:
| - zeroD.raw.data.fromNomD
- zeroD.raw.data
| - Log shows all zeroDs from nomD that are not included in zeroD from base
|
4 | - Add tags to zeroD raw file
GetZeroDMetaFile.java- DPairTagList.java
|
- ${NOM_TAR_DIR}:
- ${SRC_DIR}:
- ${TAR_DIR}:
| - zeroD.meta.data
- zeroD.meta.data.conflict
| The following must be 0: Check on "(must)" in log.4- -- Total invalid tag no (must = 0): 0
- -- Empty line no (must = 0): 0
- -- Invalid tag no (must = 0): 0
=>Sent to linguists in Step-5
- - conflict (yes|no) tag no (must = 0): 0
=> Check if tag on zeroD is consistent between two records (by SpVars)
=> if not 0, send the file (zeroD.meta.data.conflict) to linguists to verify
=> Use the tagged file to manual update the ${SRC_DIR}/zeroD.tag.txt
- - none (tbd) tag no (must be 0): 0
|
5 | - Split tags on zeroD meta file [yes|no|tbd]
SplitZeroDMetaFile.java
|
| - zeroD.yes.data
- zeroD.no.data
- zeroD.tbd.data
| If the zeroD.tbd.data is not empty:
- Send tbd dPair (zeroD.tbd.data) to linguists to tag [yes|no].
- Put tagged file to ${DERIVATION}/data/${YEAR}/dataOrg/Tags/zeroD.tbd.data.tagged.txt
- Append tagged file to ./dataOrg/zeroD.tag.txt
- Then, run step 5a, then rerun Steps: 4~5 until it is empty
|
5a | - Clean up tags on tagged file
CleanUpDPairTagList.java
|
| - ${SRC_DIR}:
Re-run this step until:
- conflict = 0
If not, send conflict (from log.5a) to linguists to re-tag.
Do NOT replace zeroD.tag.txt with zeroD.tag.txt.cleanUp until conflict = 0
- duplicate = 0
If not, replace zeroD.tag.txt with zeroD.tag.txt.cleanUp
- diff = 0
If not, replace zeroD.tag.txt with zeroD.tag.txt.cleanUp
- Then, rerun Steps: 4~5 until conflict no, tbd no, invalid tag no, etc. are 0
| 6 | - Verify dType on valid zeroD
DType.java
|
- ${ALL_SRC_DIR}:
- ${TAR_DIR}:
| - zeroD.yes.data.type
- zeroD.yes.data.type.Z
- zeroD.yes.data.type.S
- zeroD.yes.data.type.P
- zeroD.yes.data.type.ZS
- zeroD.yes.data.type.SS
- zeroD.yes.data.type.PS
- zeroD.yes.data.type.U
| The following should be 0- prefixD (|P|)
- suffixD (|S|)
- zeroD by SpVars (|ZS|)
- prefixD by SpVars (|PS|)
- suffixD by SpVars (|SS|)
- unknonw dType (|U|)
| 7 | - Automatically add negation tag [O] to all valid zeroD pairs, then sort -u
AddNegationTagToFile.java- DPairTagList.java
|
| - zeroD.yes.data.${YEAR}
- zeroD.yes.data.${YEAR}.conflict
| The following should be 0
- -- Empty line no (must = 0): 0
- -- Invalid tag no (must = 0): 0
- - none (tbd) tag no (must be 0): 0
If the "conflict (N|O) tag no:" is not 0:
- The conflict file (zeroD.yes.data.2018.conflict) lists all inconsistnent zeroD tags between SpVars in two records
- Send conflicts to linguists to tag [yes|no|both] on EUI lines
- In the past, no both cases in zeroD
- Manually update the results to zeroD.tag.txt
- Re-run Steps: 4~7
If the "conflict (N|O) tag no:" is 0:
- program generates zeroD.yes.data.${YEAR}
| 8 | - Check afflix on zeroD
CheckDerivationByAffix6.java
|
- ${ALL_DRC_DIR}:
- ${SRC_DIR}:
- ${TAR_DIR}:
|
|
- The number of possible invalid zeroD: 0 (should be 0)
- Make sure zeroD.pattern3.rpt is empty. If not, send to linguist to tag [Yes|No]:
- invalid dPair [No]: add to zeroD.tagNo.txt, then rerun Steps: 3~6 (?? this file is never used in zeroD)
- valid dPair [Yes]: add to zeroD.tagYes.txt, then rerun Steps: 3~6
- Please notes that both above 2 files should be empty because there is no exception for afflix check on zeroD
| 9 | | See above | See above | Not recomended!
| 10 | - Auto-fix zeroD.tag.txt for conflicts by SpVar
FixConflictDPairTags.java
| - ${SRC_DIR}:
- zeroD.tag.txt.${YEAR}
- zeroD.meta.data.conflict.tag.data
- zeroD.tag.txt.${YEAR}.fixDPair
| Not used after 2014+!
| |
|