Pre-Process: Word-Jdid-Wc-Dc table
- Description:
This file includes final scores of Word-Jdid-Wc-Dc for all words in training set (MEDLINE). This file is used as the input file for JDI database.
- Input:
- Procedures & Java files:
- GenerateWordJdidWcDcTable.java
- Read and calculate word count and document count scores for all word-Jdid from file and then sent to output file
- Read total word count and document count for each word-Jdid from wordJdidWcDcGt1.txt
- Read total (normalized) Wc signal and total Dc for all words from wordSignalWcDcScores.txt
- Read jdDcNFactor for each Jdid from jdidDcNFactor.txt
- Calculate word count scores and document count scores for all word-Jdid:
- word count score = (word count/total normalized Wc signal) * NFactor
- document count score = (document count/total of Dc) * NFactor
- Print out Word-Jdid-Wc-Dc scores
- Output file: