The SPECIALIST Lexicon

Lexicon Words Stats

I. Introduction

This page describes programs to get stats of Lexicon words, using MEDLINE for frequency (WC|DC).

II. Detail Process

  • Dir: ${MULTIWORDS}/bin/11.LexWords
  • Programs:
    StepDescriptionInputsOutputsNotes
    MEDLINE Unigram Spectrum Analysis
    1Group raw unigram by core-term.lc
    • NGramUtil.GrepTermsSort
    • ./Medline/unigram.${YEAR}
    • ./Medline/unigram.${YEAR}.core.lc
    • ./Medline/unigram.${YEAR}.core.lc.detail
    • Auto link unigram
    2Get MEDLINE unigram WC Frequency Spectrum
    • NGramUtil.GetBasicHistogram
    • ./Medline/unigram.${YEAR}.core.lc
    • ./Medline/unigram.${YEAR}.core.lc.his.csv
    • Used as input data for Excel diagram
    Lexicon Word Spectrum Analysis
    10Get Lexicon single word frequency spectrum
    • LexWords.GetLexWordFreSpectrum
    • TYPE 0: all, 1: SW, 2: MW
    • ${IN_DIR}inflVars.data
    • ./Medline/unigram.${YEAR}.core.lc
    • ./LexSpec/sWord.b.csv (Lexicon words in MEDLINE or not)
    • ./LexSpec/sWord.l.csv (Lexicon words with MEDLINE WC)
    • ./LexSpec/sWord.rpt
    • ./LexSpec/sWord.sum
    11Group distilled n-gram set by core-term.lc
    • NGramUtil.GroupByCoreTerm
    • ${NGRAM_DIR}nGrams/distilledNGram.${YEAR}
    • ${NGRAM_DIR}nGrams/distilledNGram.${YEAR}.core.lc
    • ${NGRAM_DIR}nGrams/distilledNGram.${YEAR}.core.lc.detail
    • Same as step-11 in 06.NGramUtil
    12Get all words frequency spectrum
    • LexWords.GetLexWordFreSpectrum
    • TYPE 0: all, 1: SW, 2: MW
    • ${IN_DIR}inflVars.data
    • ${NGRAM_DIR}nGrams/distilledNGram.${YEAR}.core.lc
    • ./LexSpec/aWord.b.csv (Lexicon words in MEDLINE or not)
    • ./LexSpec/aWord.l.csv (Lexicon words with MEDLINE WC)
    • ./LexSpec/aWord.rpt
    • aWord.sum
    13Get multiwords frequency spectrum
    • LexWords.GetLexWordFreSpectrum
    • TYPE 0: all, 1: SW, 2: MW
    • ${IN_DIR}inflVars.data
    • ${NGRAM_DIR}nGrams/distilledNGram.${YEAR}.core.lc
    • ./LexSpec/mWord.b.csv (Lexicon words in MEDLINE or not)
    • ./LexSpec/mWord.l.csv (Lexicon words with MEDLINE WC)
    • ./LexSpec/mWord.rpt
    • ./LexSpec/mWord.sum
    Lexicon Word Histgram Analysis (Used in Amia Paper)
    20Get normTerm.lc from inflVars
    • CandidateUtil.ToCoreTerm
    • ${IN_DIR}inflVars.data.f1
    • ./LexHist/inflVars.data.f1.core.lc
    • Get the norm-term.lc from inflVars
    21Split single word and multiwords from lexicon inflVars
    • LexWords.SplitSingleMultiWords
    • ./LexHist/inflVars.data.f1.core
    • inflVars.data.f1.core.mw
    • inflVars.data.f1.core.sw
    • Same as step-11 in 06.NGramUtil
    22Add WC to Lexicon single word
    • NGramUtil.AddWcToCoreTerm
    • ./LexHist/inflVars.data.f1.core.sw
    • ${NGRAM_DIR}nGrams/nGramSet.${YEAR}.30.core.lc
    • ./LexHist/inflVars.data.f1.core.sw.wc
    23Add WC to Lexicon multiword
    • NGramUtil.AddWcToCoreTerm
    • ./LexHist/inflVars.data.f1.core.mw
    • ${NGRAM_DIR}nGrams/nGramSet.${YEAR}.30.core.lc
    • ./LexHist/inflVars.data.f1.core.mw.wc
    24Get WC histogram on Lexicon single word
    • CandidateUtil.HistogramUtil
    • ./LexHist/inflVars.data.f1.core.sw.wc
    • ./LexHist/inflVars.data.f1.core.sw.wc.his.minWC-maxWc.secNO.csv
    • Use as input to feed Excel diagram
    25Get WC histogram on Lexicon multiword
    • CandidateUtil.HistogramUtil
    • ./LexHist/inflVars.data.f1.core.mw.wc
    • ./LexHist/inflVars.data.f1.core.mw.wc.his.minWC-maxWc.secNO.csv
    • Use as input to feed Excel diagram