Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
Multiwords: Rules-based Filter
I. Rules
The following rules are used to filter out invalid multiwords from MEDLINE n-grams (by the same sequencial order in the program):
Type | Description | Examples (with element word "mellitus") |
---|---|---|
Default: valid candidate multiword | ||
RuleType.RT_TBD | Candidate multiwords |
|
Check if the whole n-gram is in Lexicon | ||
RuleType.RT_W_LEX_EM | Whole n-gram: exact match |
|
RuleType.RT_W_LEX_LC | Whole n-gram: match (lowercase) |
|
RuleType.RT_W_LEX_HT_PUNC | Whole n-gram: match (remove head & tail punctuation) |
|
RuleType.RT_W_LEX_LC_HT_PUNC | Whole n-gram: match (lowercase, remove head & tail punctuation) |
|
RuleType.RT_W_LEX_PUNC | Whole n-gram: match (remove punctuation) |
|
RuleType.RT_W_LEX_LC_PUNC | Whole n-gram: match (lowercase, remove punctuation) - whole n-gram |
|
Check the tail (ending) word of n-gram | ||
RuleType.RT_T_ABB | tail word - acronym in parenthesis:
|
|
RuleType.RT_T_PREP | tail word - preposition |
|
RuleType.RT_T_CONJ | tail word - conjuction |
|
RuleType.RT_T_AUX | tail word - auxiliary |
|
RuleType.RT_T_MODAL | tail word - modal |
|
RuleType.RT_T_COMPL | tail word - complementizer |
|
RuleType.RT_T_DET | tail word - determiner |
|
Check the head (beginning) word of n-gram | ||
RuleType.RT_H_PREP | tail word - preposition |
|
RuleType.RT_H_CONJ | tail word - conjuction |
|
RuleType.RT_H_AUX | tail word - auxiliary |
|
RuleType.RT_H_COMPL | tail word - determiner |
|
RuleType.RT_H_MODAL | tail word - modal |
|