Text Categorization

Pre-Process: Sh-Jdid-Dc Table

  • Description:

    This file includes Mesh SubHeadings-Jdid-Dc scores. This file is generated and loaded into DB table, and then used to perform JD indexing on Mesh. The format of this file is:

    SubHeadingJdidDc Score

  • Input files:

  • Procedures & Java files:
    • GenerateShJdidDcTable
      • Read in total Document count for all SH from shDc.txt
      • Read in sh-Jdid-Dc from shJdidDc.txt
      • Read in jdidDcNFactor from jdidDcNFactor.txt
      • Calculate Dc Socres for all SubHeadings by:
        • Dc Score = (document count/total document count) * NFactor
      • Print out Sh-Jdid-Dc Score

  • Output file:
    • shJdidDc.txt, used in TC.JDI database tables
      SubHeadingJdidDc Score