Text Categorization

Pre-Process: Mh-Jdid-Dc Table

  • Description:

    This file includes Mesh Headings-Jdid-Dc scores. This file is generated and loaded into DB table, and then used to perform JD indexing on Mesh. The format of this file is:

    Main HeadingJdidDc Score

  • Input files:

  • Procedures & Java files:
    • GenerateMhJdidScoreTable
      • Read in total Document count of all MHs from mhDc.txt
      • Read in Mh-Jdid-Dc from mhJdidDc.txt
      • Read in jdDcNFactor from jdDcNFactor.txt
      • Calculate Dc Scores for all Main Headings by:
        • Dc Score = (document count/total document count) * NFactor
      • Print out Mh-Jdid-Dc Score

  • Output file:
    • mhJdidDcTable.txt, used in TC.JDI database tables
      Main HeadingJdidDc Score