Text Categorization

Data Set

Java programs are developed to generate tables for JDI, STI, and STRI since TC.2009 release. This page describes the complete data set for feeding in to Java program to generate these tables. These files are used and generated in the pre-Process.

  • JDI Data Set (for ${YEAR})
    FileVersionSourcesUpdateNotes
    lsi.xml${YEAR}ftp://ftp.nlm.nih.gov/online/journals/Yes 
    MEDLINE${YEAR}/nsfvol/nls/MEDLINE_Baseline_Repository/${YEAR}Yes 
    MRCON${PRE_YEAR}ABash:/u03/umls/Releases/${PRE_YEAR}AB/Full/ORF/META/MRCONYes 
    contractions.txt TC.${PRE_YEAR}No 
    jds.txt TC.${PRE_YEAR}No 
    shs.txt TC.${PRE_YEAR}No 
    stopWords.txt TC.${PRE_YEAR}YesManually update
    MedLineFiles.txt Auto-GenerateYesFiles list in MEDLINE
    MedLineYears.txt Auto-GenerateYesYears in MEDLINE

  • STI Data Set (for ${YEAR})
    FileVersionSourcesUpdateNotes
    MRSTY${PRE_YEAR}ABash:/u03/umls/Releases/${PRE_YEAR}AB/Full/ORF/META/MRSTYYes 
    MRCONSO.RRF${PRE_YEAR}ABash:/u03/umls/Releases/${PRE_YEAR}AB/Full/RRF/META/MRCONSO.RRFYes 
    SRDEF.txt http://semanticnetwork.nlm.nih.gov/Download/RelationalFiles/SRDEFYes 
    stGroup.txt http://semanticnetwork.nlm.nih.gov/SemGroups/SemGroups.txtYesNeeds further modifications

  • STRI Data Set (for ${YEAR})
    All tables for STRI are generated in JDI and STI. No new data set are needed.