option and Description | input | Output | Notes
|
---|
65
- Get Antonyms from MEDLINE 3-grams by a specify middle keyword (and/or):
- Medline.GetAntCandFrom3GramPatMid.java
|
- ${ML_NGRAM_DIR}/input/3-gram.${YEAR}.30.core
- ${META_DIR}/input/normTermCui.data
- ${META_DIR}/input/MRSTY.RRF
- ${LEX_DIR}/input/inflVars.data
- ${LEX_DIR}/input/synonym.data
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
- ${ANT_DIR}/input/domain.data
- /nfsvol/lex/Lu/Projects/LVG/lvg${LVG_YEAR}/data/config/lvg.properties
|
- ./output/PreCand/antCandPatMid.andOr.data
|
- This step is not used in the annual process. But, it might need before step-66.
- This step is used to pre-run Step-66 by using 1 middle word in 3-grams to get collocates for antonyms. Must run this to make sure everything is OK before running Step-66.
- If run the 1st time:
- shell> mkdir ./output/PreCand
- make sure all input files are setup correctly
- Different versions of data are used due to different released dates of data:
- Lexicon Antonym release: ${YEAR}
- META-thesaurus: ${PREV_YEAR}AA
- MEDLINE: ${PREV_YEAR}
- LVG: ${PREV_YEAR}
- This program set the defaults keyword to "and/or".
|
66
- Get Antonyms from MEDLINE 3-grams by specify middle keywords
- Medline.GetAntCandFrom3GramPatMid.java
|
- ${ML_NGRAM_DIR}/input/3-gram.${YEAR}.30.core
- ${META_DIR}/input/normTermCui.data
- ${META_DIR}/input/MRSTY.RRF
- ${LEX_DIR}/input/inflVars.data
- ${LEX_DIR}/input/synonym.data
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
- ${ANT_DIR}/input/domain.data
- /nfsvol/lex/Lu/Projects/LVG/lvg${LVG_YEAR}/data/config/lvg.properties
|
- ./output/PreCand/antCandPatMid.${KEY_WORD}.data
|
- Currently, this program inlcudes the top 8 highest frequency keywords: "and or to versus than vs from and|or", as defined in the scripts.
- The latest data are used with different version, because of different released dates of data:
- Lexicon Antonym release: ${YEAR}
- Lexicon: ${YEAR}
- META-thesaurus: ${PREV_YEAR}AA
- MEDLINE: ${PREV_YEAR}
- LVG: ${PREV_YEAR}
|
67
- Get antCand by combining results from above steps: 65-66
- Medline.CombineAntCandFrom3GramPatMid.java
|
- ./output/PreCand/antCandPatMid.${KEY_WROD}.data.wc
- ./output/PreCand/keyWords.data
|
- ./output/PreCand/antCandPatMid.cand.data.raw
=> include raw collocates that happen once in 1 of 8 keywords
- ./output/PreCand/antCandPatMid.cand.data.filtered
Heuristic filter rules:
=> include filtered collocates: happen in 3 of 8 keywords, not include "other|E0044444", and not self-aPairs
=> is the sum of files: tag + tbd
- ./output/PreCand/antCandPatMid.cand.data.tag
- ./output/PreCand/antCandPatMid.cand.data.tag.CC
- ${ML_DIR}/output/Cand/antCandPatMid.cand.data.tbd
|
- If run the first time:
- shell> mkdir Cand
- shell> mkdir candTagged
- copy ${PreCand}/keyWords.data from ${PREV_YEAR}
- TBD should be 0
- If not, send cand ${ML_DIR}/output/Cand/antCandPatMid.cand.data.tbd to linguist to tag
|
68
- Validate and fix tags of antonym candidates (CC)
- Antonym.ValidateTaggedCand.java
|
- ${CC_DIR}/output/candTagged/antCandPatMid.data.tag.tagged
- ${ANT_DIR}/input/domain.data
|
- ${CC_DIR}/output/candTagged/antCandPatMid.data.tag.fixed
|
- Prepare/add tagged candidates to antCandPatMid.data.tag.tagged
- convert tagged candidate file to standard format:
shell> flds 3,4,5,6,7,8,9,10,11,12 antCandPatMid.cand.data.tbd.{YEAR}.${NO}.tagged > antCandPatMid.data.data.tbd.${YEAR}.${NO}.tagged.3-12
- append
antCandPatMid.data.data.tbd.${YEAR}.${NO}.tagged.3-12 to antCandPatMid.data.tag.tagged.${YEAR}.${NO}
- sort -u antCandPatMid.data.tag.tagged.${YEAR}.${NO} > antCandPatMid.data.tag.tagged.${YEAR}.${NO}.uSort
shell> ln -sf antCandPatMid.data.tag.tagged.${YEAR}.${NO}.uSort antCandPatMid.data.tag.tagged
- run this step (68) until tag and fixed files are the same
- Fixed file is the auto-fixes on [TYPE_TBD] and [DOMAIN_TBD] to [NA] and [DOMAIN_NONE].
- Manually copy the fixed file to tagged file, then run it again until they are the same
- Manually copy antCandPatMid.data.tag.tagged to antCandPatMid.data.tag.tagged.${YEAR}
|
69
- Update release antonyms tagged file form CC
- Antonym.UpdateAllTaggedFile.java
|
- ${CC_DIR}/output/candTagged/antCandPatMid.data.tag.tagged.${YEAR}
- ${ANT_DIR}/input/antCand.data.tag.${YEAR}
- ${ANT_DIR}/input/domain.data
|
- ${ANT_DIR}/input/antCand.data.tag.updated
|
- This step auto-update all antonym candidate tag file
- Manully copy antCand.data.tag.updated to antCand.data.tag.updated.CC
- Manully copy antCand.data.tag.updated to antCand.data.tag.${YEAR}
- The output file is used to generate antonym and negation files for the release.
- Re-run steps 66-69 until it passes all steps
- Re-run 66-67 to gen the latest aPair candidate list for linugists
|