CandidateUtil.AnalyzeCandidateHistogram
cronymExp.tag.data.tag.new.yesNo.his.min-max.sec.csv
| Developed, not used for analysis yet
| Precision and Recall Analysis for AMIA full paper
|
30 | Tag Matcher-(ACR): baseline to be used as gold standard
- Must get the latest inflVars.data from Lexicon
- Must run Step 7-9 first (for invalidMwForParAcr.data.final)
TagCandidateFile.java auto-tag:
- [y]: if it is in Lexicon (inflVars.data)
- [n]: invalidMwForParAcr.data.final
- [tbd]: otherwise
|
- inFile: acronymExp.subterm.raw.core
- validFile: inflVars.data.current
- invalidFile: invalidMwForParAcr.data.current
|
- acronymExp.subterm.raw.core.tag.${YEAR}
- acronymExp.subterm.raw.core.tag.${YEAR}.no
- acronymExp.subterm.raw.core.tag.${YEAR}.tbd
- acronymExp.subterm.raw.core.tag.${YEAR}.yes
- acronymExp.subterm.raw.core.tag.${YEAR}.yesNo
=> Used as the gold standard for precision and recall
|
- Must run step 7-9 first (to update invalidMwForParAcr.data.current)
- Must update the inflVars.data.current from Lexicon (approve all submit records)
|
31 | Get precision, recall, F1 for Baseline (acronym expansion)
GetPRF.java
- Must run step-30 first
|
- goldStd: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo
- test: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.f1
|
- Output (PRF) is on the screen
|
- not goldStd No: must be 0
- err tag No: must be 0
- Must finished steps 30
|
32 | Tag (ACR) + Distilled set, PRF
CandidateUtil.ApplyDistToFile
CandidateUtil.GetPRF
|
- in: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.f1
- dist: nGrams/distilledNGram.${YEAR}.core
- goldStd: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo
|
- acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.f1.dist
|
- Must finished steps 30, 31
|
33 | Tag (ACR) + SpVar, PRF
CandidateUtil.ApplySpVarToFile
CandidateUtil.GetPRF
|
- in: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.f1
- spVar: Candidates/distilledNGram.2014.core.150.sort.term.spVars.latest
- goldStd: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo
|
- acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.f1.spVar
|
- Must finished steps 30, 31
|
34 | Tag (ACR) + CUI, PRF
CandidateUtil.ApplyCuiToFile
CandidateUtil.GetPRF
|
- in: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.f1
- smt: data/Config/smt.properties
- goldStd: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo
|
- acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.f1.cui
|
- Must finished steps 30, 31
|
35 | Tag (ACR) + EndWord, PRF
CandidateUtil.ApplyEndWordToFile
CandidateUtil.GetPRF
|
- in: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.f1
- endWord: inFilterEndWord.data.used
- goldStd: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo
|
- acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.f1.endWord
|
- Must finished steps 30, 31
|
36 | Tag (ACR) + CUI + SpVar, PRF
CandidateUtil.ApplSpVarToFile
CandidateUtil.GetPRF
|
- in: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.f1.cui
- spVar: Candidates/distilledNGram.2014.core.150.sort.term.spVars.latest
- goldStd: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo
|
- acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.f1.cui.spVar
|
- Must finished steps 30, 31, 34
|
37 | Tag (ACR) + CUI + SpVar + EndWord, PRF
CandidateUtil.ApplyEndWordToFile
CandidateUtil.GetPRF
|
- in: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.f1.cui.spVar
- endWord: inFilterEndWord.data.used
- goldStd: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo
|
- acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.f1.cui.spVar.endWord
|
- Must finished steps 30, 31, 34, 36
|
Frequency (WC) Analysis on (ACR) for AMIA poster paper
|
40 | Add WC to GoldStd
CandidateUtil.AddWcToTermTagFile
|
- in: acronymExp.subterm.raw.core.tag.${YEAR}.yesNo
- ngram_wc: nGrams/nGramSet.${YEAR}.30.core
|
- acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.wc
|
- Must finished n-gram core term
- Must finished step 30
|
41 | Get Histogram of GoldStd
CandidateUtil.GetPRFHistogram
|
- acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.wc
|
- acronymExp.subterm.raw.core.tag.${YEAR}.yesNo.wc.minWc-maxWc.increment.prfHis.csv
|
- Should run this one time to get the Max. WC, then use it as input
|
Frequency (WC) Analysis on LEXICON for AMIA poster paper
|
45 | Add WC to LMWs and LSWs
CandidateUtil.GetSwMwFromLexicon
=> get LMWs and LSWs from Lexicon (inflVars.data)
CandidateUtil.AddWcToTermFile
=> Add WC to LSWs
CandidateUtil.AddWcToTermFile
=> Add WC to LMWs
|
- inflVars.data
- nGrams/nGramSet.${YEAR}.30.core
|
- ./10.LexWords/inflVars.data.lsw
- ./10.LexWords/inflVars.data.lmw
- ./10.LexWords/inflVars.data.lsw.wc
- ./10.LexWords/inflVars.data.lmw.wc
|
- This data is used in Figure-1 WC spectrum: no. of terms vs. WC class
|
46 | Get Histogram of LSWs
CandidateUtil.GetHistogram
|
- ./10.LexWords/inflVars.data.lsw.wc
|
- ./10.LexWords/inflVars.data.lsw.wc/minWc-maxWc.incWc.his.csv
|
- Should run this one time to get the Max. WC, then use it as input
|
47 | Get Histogram of LMWs
|
- ./10.LexWords/inflVars.data.lmw.wc
|
- ./10.LexWords/inflVars.data.lmw.wc/minWc-maxWc.incWc.his.csv
|
- Should run this one time to get the Max. WC, then use it as input
|