Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Text Categorization

Pre-Process: Word-Jdid-Wc-Dc table

  • Description:
    This file includes final scores of Word-Jdid-Wc-Dc for all words in training set (MEDLINE). This file is used as the input file for JDI database.

  • Input:

  • Procedures & Java files:
    • GenerateWordJdidWcDcTable.java
    • Read and calculate word count and document count scores for all word-Jdid from file and then sent to output file
      • Read total word count and document count for each word-Jdid from wordJdidWcDcGt1.txt
      • Read total (normalized) Wc signal and total Dc for all words from wordSignalWcDcScores.txt
      • Read jdDcNFactor for each Jdid from jdidDcNFactor.txt
      • Calculate word count scores and document count scores for all word-Jdid:
        • word count score = (word count/total normalized Wc signal) * NFactor
        • document count score = (document count/total of Dc) * NFactor
    • Print out Word-Jdid-Wc-Dc scores

  • Output file: