Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

Text Categorization

Pre-Process: Word-Jdid-Wc-Dc (Gt1)

  • Description:
    This file includes information of word count and document count for all words (Gt 1) with its associated JDs in the training set (MEDLINE).

  • Input:

  • Java File & Algorithm:
    • GenerateWordJdidWcDc.java
      • Load all words from wordWcDcGt1.txt
      • Load JID-JDs from uiJidJds.${NUM}.txt
      • Calculate total word count for all words (Gt 1) for each associated JD
      • Calculate total document count for all words (Gt 1) for each associated JD
      • Send results to wordJdidWcDcGt1.txt

  • Output: