Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

The SPECIALIST Lexicon

Multiwords: Rules-based Filter

I. Rules

The following rules are used to filter out invalid multiwords from MEDLINE n-grams (by the same sequencial order in the program):

TypeDescriptionExamples (with element word "mellitus")
Default: valid candidate multiword
RuleType.RT_TBDCandidate multiwords
  • mellitus
  • diabetes mellitus
Check if the whole n-gram is in Lexicon
RuleType.RT_W_LEX_EMWhole n-gram: exact match
  • diabetes mellitus
  • insulin-dependent diabetes mellitus
RuleType.RT_W_LEX_LCWhole n-gram: match (lowercase)
  • DIABETES MELLITUS
  • Insulin-dependent diabetes mellitus
RuleType.RT_W_LEX_HT_PUNCWhole n-gram: match (remove head & tail punctuation)
  • diabetes mellitus,
  • (diabetes mellitus,
  • diabetes mellitus),
RuleType.RT_W_LEX_LC_HT_PUNCWhole n-gram: match (lowercase, remove head & tail punctuation)
  • [Diabetes mellitus
  • DIABETES MELLITUS]
  • [Diabetes mellitus]
RuleType.RT_W_LEX_PUNCWhole n-gram: match (remove punctuation)
  • diabetes mellitus -
RuleType.RT_W_LEX_LC_PUNCWhole n-gram: match (lowercase, remove punctuation) - whole n-gram
  • DIABETES MELLITUS -
Check the tail (ending) word of n-gram
RuleType.RT_T_ABBtail word - acronym in parenthesis:
  • (UPPERCASE)
  • (UPPERCASE),
  • (UPPERCASE).
  • (UPPERCASE):
  • mellitus (DM)
  • mellitus (DM),
RuleType.RT_T_PREPtail word - preposition
  • mellitus in
  • diabetes mellitus, but
RuleType.RT_T_CONJtail word - conjuction
  • mellitus or
  • Diabetes mellitus and
RuleType.RT_T_AUXtail word - auxiliary
  • mellitus is
  • Diabetes mellitus have
RuleType.RT_T_MODALtail word - modal
  • mellitus may
  • diabetes mellitus should
RuleType.RT_T_COMPLtail word - complementizer
  • diabetes mellitus that
RuleType.RT_T_DETtail word - determiner
  • mellitus: a
  • mellitus and the
Check the head (beginning) word of n-gram
RuleType.RT_H_PREPtail word - preposition
  • of diabetes mellitus
  • in diabetes mellitus,
RuleType.RT_H_CONJtail word - conjuction
  • or diabetes mellitus
  • and diabetes mellitus:
RuleType.RT_H_AUXtail word - auxiliary
  • were diabetes mellitus
  • have diabetes mellitus,
RuleType.RT_H_COMPLtail word - determiner
  • that diabetes mellitus
RuleType.RT_H_MODALtail word - modal
  •