Text Categorization

Pre-Process: Word-Jdid-Wc-Dc (Gt1)

  • Description:
    This file includes information of word count and document count for all words (Gt 1) with its associated JDs in the training set (MEDLINE).

  • Input:

  • Java File & Algorithm:
    • GenerateWordJdidWcDc.java
      • Load all words from wordWcDcGt1.txt
      • Load JID-JDs from uiJidJds.${NUM}.txt
      • Calculate total word count for all words (Gt 1) for each associated JD
      • Calculate total document count for all words (Gt 1) for each associated JD
      • Send results to wordJdidWcDcGt1.txt

  • Output: