Text Categorization

Pre-Process: Ui-Jid-Jds

  • Description:
    This file includes Ui-Jid-JD information in the train set (MEDLINE citations).

  • Input:
    • MEDLINE 2004: /nfsvol/indaux/MEDLINE_baseline/2004/medline04n${NUM}.txt
    • Date created (DA) from year: 1999, 2000, 2001
    • ${NUM} are file names of file include citations with DA in years of 1999, 2000, 2001

    • Jds.txt
    • Jid-Ta-Jds

  • Java Files & Algorithm:
    • GenerateFilesFromMedLine.java
    • Use JID to get the UI-Jid-Jds information
      • Read in all fields ( PMID, TI, AB, TA, JID, RN, MH) from MedLine citations if DA is within specified range
      • Read in JDs information through JID and Jid-Ta-Jds
      • Read in JDs information from JDs
      • Sent PMID, JID, JDs to uiJidJds${NUM}.txt

  • Output: