TC Package - PreProcess Procedures
This preprocess should be perform after the baseline software of new release is completed. Please follow the annual release procedures for a new release. The preprocess procedures for generating files for JDI, STI, and STRI are detailed in this page. Please refer to PreProcess Design & Requirements section for design details.
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE SerialsSet PUBLIC "-//NLM//DTDSERIALS, 1st January 2010//EN"
"http://www.nlm.nih.gov/databases/dtd/nlmserials_100101.dtd">
- shell> cd ${TC}/preProcess/tcPre2008/bin
- shell> 0.GetMedLineFiles
2010
2010
${YEAR}
11
- shell> cd ${TC_DIR}/tcPre2008/data/2010/Jdi
- shell> cp -rp Output Output.tc${YEAR}
- shell> 2.DeployJdiFilesToTc
${YEAR}
word | 2007 | 2008 | 2009 | 2010 | 2011 |
---|---|---|---|---|---|
risk | 464482 | ||||
cancer | 388950 | 645291 | 705814 | 754647 | 792053 |
blood | 510753 | 608233 | 629776 | 644743 | 671190 |
therapy | 444975 | 645880 | 682715 | 695532 | 713875 |
function | |||||
case | 430815 | 699541 | 723212 | 756545 | |
Max. Signal | 510754 | 645881 | 705815 | 754648 | 792054 |
- shell> 1.TestJdi (15 min.)
previous year
current year
7
Releases | WordJdidWc | WordJdidDc | MhJdidDc | ShJdidDc |
---|---|---|---|---|
2008~2009 | 97.08% | 97.69% | 99.04% | 99.99% |
2009~2010 | 96.37% | 97.01% | 98.64% | 99.82% |
2010~2011 | 96.49% | 97.10% | 98.68% | 99.76% |
From JDI:
${TC_VERSION}
${DATA_YEAR}
10
input Max Signal
shell> 4.Deploy1stRunStriFilesToTc ${YEAR}
${TC_YEAR}
${DATA_YEAR}
11
shell> 5.RefineStDoc
${YEAR}
${YEAR}
1
1
2
humn
1
1
...
-- RefineStDocuments.RefineStDocuments(): humn, word Size: 28
1. applicant|humn|T016|8|0.5825909|false|0.6438478(0.8119695-0.16812167)
2. applicants|humn|T016|4|0.57193226|false|0.67198503(0.82519585-0.15321079)
3. delegate|humn|T016|6|0.52379584|false|0.61971915(0.77649516-0.15677604)
4. descendent|humn|T016|88|0.32893714|false|0.51979005(0.64032584-0.12053579)
5. human|humn|false
6. human|humn|false
7. human|humn|false
8. human|humn|false
9. human|humn|false
10. humans|humn|T016|93|0.44167516|false|0.676633(0.82196045-0.14532742)
11. individual|humn|T016|65|0.6035321|false|0.76930225(0.9201443-0.15084207)
12. individual|humn|T016|65|0.6035321|false|0.76930225(0.9201443-0.15084207)
13. individual|humn|T016|65|0.6035321|false|0.76930225(0.9201443-0.15084207)
14. interviewee|humn|T016|24|0.4677229|false|0.61210704(0.80709076-0.19498374)
15. invoker|humn|false
16. man|humn|T016|94|0.3168362|false|0.6577292(0.8419048-0.18417563)
17. man|humn|T016|94|0.3168362|false|0.6577292(0.8419048-0.18417563)
18. man|humn|T016|94|0.3168362|false|0.6577292(0.8419048-0.18417563)
19. owner|humn|T016|4|0.63383675|false|0.75085104(0.8832318-0.13238078)
20. owner|humn|T016|4|0.63383675|false|0.75085104(0.8832318-0.13238078)
21. producer|humn|T016|92|0.2529137|false|0.7038802(0.8824603-0.17858009)
22. recipient|humn|T016|105|0.11818392|false|0.39545894(0.4790922-0.083633274)
23. resident|humn|T016|69|0.5160713|false|0.630488(0.7693282-0.1388402)
24. sponsor|humn|T016|65|0.36086074|false|0.60545766(0.75192285-0.14646521)
25. swimmer|humn|T016|2|0.62622374|false|0.7654458(0.86221415-0.09676831)
26. swimmer|humn|T016|2|0.62622374|false|0.7654458(0.86221415-0.09676831)
27. swimmer|humn|T016|2|0.62622374|false|0.7654458(0.86221415-0.09676831)
28. user|humn|T016|39|0.25950813|false|0.7680406(0.8928807-0.1248401)
...
- shell> cd ${TC}/tc${YEAR}/bin/loadDb/
- shell> 2.AnalyzeInFiles ${YEAR}
- shell> cd ${TC}/tc${YEAR}/
- shell> ./bin/loadDb/3.LoadDb
4) Word-St Scores
shell> cd ${TEST}/TC/WsdTest/
shell> ${TEST}/TC/WsdTest/bin/2.TestWsd
shell> ${TEST}/TC/WsdTest/bin/3.TestWsdStats
shell> ${TEST}/TC/WsdTest/bin/4.TestAll
ST WSD Collections Tests (both train and test sets):
TC Version | Ambiguous Sentence | Ambiguous Sentences | Ti-AB | ||||||
---|---|---|---|---|---|---|---|---|---|
DC | WC | CS | DC | WC | CS | DC | WC | CS | |
2007 | 74.61% | 75.00% | 74.91% | 74.95% | 75.39% | 75.05% | 74.05% | 74.32% | 74.32% |
2008 | 73.81% | 74.93% | 74.36% | 74.30% | 75.00% | 74.77% | 73.52% | 74.44% | 74.01% |
2009 | 77.37% | 77.11% | 76.91% | 76.79% | 76.72% | 76.62% | 76.13% | 76.65% | 76.12% |
2010 | 76.62% | 77.36% | 77.27% | 75.96% | 76.59% | 76.73% | 74.85% | 76.38% | 75.24% |
2011 | 77.11% | 77.53% | 77.24% | 76.00% | 77.10% | 76.49% | 74.82% | 76.81% | 75.55% |
shell> cd ${TEST}/TC/WsdTest2/
shell> ${TEST}/TC/WsdTest2/bin/2.TestWsd
shell> ${TEST}/TC/WsdTest2/bin/3.TestWsdStats
shell> ${TEST}/TC/WsdTest2/bin/4.TestAll
MSH WSD Set Tests:
The precision excludes answer can not be found by StWSD:
Precision/Weighted Precision Test for MSH WSD set (both ambiguous abbreviatons and ambiguous terms):
TC Version | Ambiguous Sentence | Ambiguous Sentences | Ti-AB | ||||||
---|---|---|---|---|---|---|---|---|---|
DC | WC | CS | DC | WC | CS | DC | WC | CS | |
2007 | 70.66% 71.90% | 70.58% 72.19% | 70.70% 72.13% | 70.56% 71.34% | 70.59% 71.40% | 70.58% 71.56% | 70.79% 70.84% | 70.76% 71.31% | 70.79% 70.98% |
2008 | 70.42% 70.88% | 70.49% 71.33% | 70.48% 71.02% | 69.85% 70.63% | 70.09% 71.27% | 70.06% 71.08% | 69.54% 69.79% | 69.30% 69.67% | 69.23% 69.57% |
2009 | 66.63% 66.91% | 66.21% 66.83% | 66.44% 66.72% | 66.46% 67.14% | 65.79% 66.47% | 64.23% 66.74% | 66.93 66.96% | 66.36% 66.56 | 66.78% 66.81% |
2010 | 65.86% 65.62% | 65.69% 66.05% | 65.72% 65.92% | 65.62% 65.96% | 65.42% 65.93% | 65.58% 66.03% | 66.12% 65.73% | 65.83% 65.85% | 66.05% 65.83% |
2011 | 67.09% 66.64% | 66.76% 66.93% | 67.00% 66.76% | 66.90% 66.43% | 66.89% 67.21% | 66.64% 66.55% | 67.20% 66.35% | 67.06% 66.67% | 67.05% 66.34% |