N-gram Set by Prediction Filter
A new approach of prediction filter is developed to resolve issues of limited memory. This approache retrieve an approximate n-Gram as an alternative approach. However, this approach is not comprehensive and should be replace by a more thorough approach.
I. Prediction Filter:
Use the frequency (NWC) of normalized (n)-gram terms as filter to generate (n+1)-gram terms:
II. N-gram Set with Prediction Filter:
III. Example Walk-through (MEDLINE.2014):
Preprocess | unigrams | bigrams | trigrams | fourgrams | fivegrams | |
---|---|---|---|---|---|---|
N | n=1 | n=2 | n=3 | n=4 | n=5 | |
Step 1 PmidTiAbSentences{YY}n{DDDD}.txt |
| |||||
Step 2 Gen uniGram, sorted |
| |||||
Step 3 Norm (n-1)-gram |
|
|
|
| ||
Step 4 Gen (n-1)-gram for threshold on NWC |
|
|
|
| ||
Step 5 Gen n-Gram |
|
|
|
|
| |
Step 6 Sort n-Gram | 33 min. | 45 min. | 40 min. | 33 min. |