Multiwords: Rules-based Filter
I. Rules
The following rules are used to filter out invalid multiwords from MEDLINE n-grams (by the same sequencial order in the program):
Type | Description | Examples (with element word "mellitus") |
---|---|---|
Default: valid candidate multiword | ||
RuleType.RT_TBD | Candidate multiwords |
|
Check if the whole n-gram is in Lexicon | ||
RuleType.RT_W_LEX_EM | Whole n-gram: exact match |
|
RuleType.RT_W_LEX_LC | Whole n-gram: match (lowercase) |
|
RuleType.RT_W_LEX_HT_PUNC | Whole n-gram: match (remove head & tail punctuation) |
|
RuleType.RT_W_LEX_LC_HT_PUNC | Whole n-gram: match (lowercase, remove head & tail punctuation) |
|
RuleType.RT_W_LEX_PUNC | Whole n-gram: match (remove punctuation) |
|
RuleType.RT_W_LEX_LC_PUNC | Whole n-gram: match (lowercase, remove punctuation) - whole n-gram |
|
Check the tail (ending) word of n-gram | ||
RuleType.RT_T_ABB | tail word - acronym in parenthesis:
|
|
RuleType.RT_T_PREP | tail word - preposition |
|
RuleType.RT_T_CONJ | tail word - conjuction |
|
RuleType.RT_T_AUX | tail word - auxiliary |
|
RuleType.RT_T_MODAL | tail word - modal |
|
RuleType.RT_T_COMPL | tail word - complementizer |
|
RuleType.RT_T_DET | tail word - determiner |
|
Check the head (beginning) word of n-gram | ||
RuleType.RT_H_PREP | tail word - preposition |
|
RuleType.RT_H_CONJ | tail word - conjuction |
|
RuleType.RT_H_AUX | tail word - auxiliary |
|
RuleType.RT_H_COMPL | tail word - determiner |
|
RuleType.RT_H_MODAL | tail word - modal |
|