Option | Description | input | Output | Notes | Option
|
---|
30 |
- get antonym candidates from prefixD
- derivation.getantcandfromprefixd.java
|
- ${PD_DIR}/input/derivation.data
- ${LEX_DIR}/input/inflvars.data
- ${ANT_DIR}/input/antcand.data.tag.${year}
- ${ANT_DIR}/input/domain.data
|
- ./output/Cand/antCandPrefixD.data
=> main output file, include tagged and not tagged.
- ./output/Cand/antCandPrefixD.data.tag
=> aPairs already tagged
- ./output/Cand/antCandPrefixD.data.tbd
=> aPairs to be tagged, should be 0 when is completed (|CANON_TBD|)
- ./output/candTagged/antCandPrefixD.data.tag.tagged
=> should be the same as ./output/Cand/antCandPrefixD.data.tag (beacuse all tagged aPairs are PD) at this step. It is used s the base (plus new tagged) to check tagged aPair form PD.
|
- if the first time:
- mkdir ./${year}/output/Cand
- mkdir ./${year}/output/candTagged
- use updated derivation.data and inflvars.data
- send antCandPrefixD.data.tbd.${YEAR}.${VERSION} to linguist to complete the tags
After the 2026 release, there are 5,784 tbd aPairs yet to be tagged. This number is expected to be much less (only for the annual growth of the prefixD) during the annual release once this is completed tagged.
| 30
|
31 |
- Validate and fix tags of antonym candidates (PD)
- Antonym.ValidateTaggedCand.java
|
- ./output/candTagged/antCandPrefixD.data.tag.tagged
- ${ANT_DIR}/input/domain.data
|
- ./output/candTagged/antCandPrefixD.data.tag.fixed
|
- Append linguist's tag to ${PD_DIR}/output/candTagged/antCandPrefixD.data.tag.tagged
- Run this step until the tag and fixed file are the same.
- Fixed file is the auto-fixes on [TYPE_TBD] and [DOMAIN_TBD] to [NA] and [DOMAIN_NONE].
- shell> sort -u antCandPrefixD.data.tag.fixed > antCandPrefixD.data.tag.fixed.uSort
- Manually make a backup copy the sorted-fixed file to tagged file antCandPrefixD.data.tag.tagged.${YEAR}.${NO}
- Use the fixed file as input tag file to re-run this program until input and output is the same.
- Manually make a copy the tagged file as release file antCandPrefixD.data.tag.tagged.${YEAR}
| 31
|
32 |
- Update release antonyms tagged file form PD
- Antonym.UpdateAllTaggedFile
|
- ./output/candTagged/antCandPrefixD.data.tag.tagged.${YEAR}
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
- ${ANT_DIR}/input/domain.data
|
- ${ANT_DIR}/input/antCand.data.tag.updated
- ${ANT_DIR}/input/antCand.data.tag.updated.srcConflict
- ${ANT_DIR}/input/antCand.data.tag.updated.tarConflict
|
- This step auto-update all antonym candidate tag file.
- Add new tags to the tag file.
- If tags exist, update the source in the order of LEX, SD, PD, CC, SN.
- print out conflicts (duplicates) of src. This is OK! For example:
- activate|E0007090|deactivate|E0417566|verb|Y|UB|BN2|quality|SN
- activate|E0007090|deactivate|E0417566|verb|Y|UB|BN2|quality|PD
- print out conflicts of tags. This myst be fixed (manually).
- conflicted tags could be type, negation and domain.
- send antCand.data.tag.updated.tagConflict to linguist to fix.
- Then, manually fix one by one on both input files:
- antCandPrefixD.data.tag.tagged.${YEAR}
- antCand.data.tag.$[YEAR}
- cd ${ANT_DIR}/input
- Manully copy antCand.data.tag.updated to antCand.data.tag.updated.3.PD
- Manully copy/link antCand.data.tag.updated to antCand.data.tag.${YEAR}
- The output file is used to generate antonym and negation files for the release.
- Re-run steps 30-32 until it passes all steps.
=> update ${ANT_DIR}/input/antCand.data.tag.${YEAR} in step 32
=> update ${PREFIXD_DIR}/output/candTagged/antCandPrefixD.data.tag.tagged.${YEAR} in step 31
--- antCand.data.tag.$[YEAR} ---
- Total tag conflict no = 0
- Total source conflict no = 0
- Total duplicate tag = 0
--- antCandPrefixD.data.tag.tagged.${YEAR} ---
- tag conflict no = 0
| 32
|