Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
N-gram Set by Prediction Filter
A new approach of prediction filter is developed to resolve issues of limited memory. This approache retrieve an approximate n-Gram as an alternative approach. However, this approach is not comprehensive and should be replace by a more thorough approach.
I. Prediction Filter:
Use the frequency (NWC) of normalized (n)-gram terms as filter to generate (n+1)-gram terms:
II. N-gram Set with Prediction Filter:
III. Example Walk-through (MEDLINE.2014):
Preprocess | unigrams | bigrams | trigrams | fourgrams | fivegrams | |
---|---|---|---|---|---|---|
N | n=1 | n=2 | n=3 | n=4 | n=5 | |
Step 1 PmidTiAbSentences{YY}n{DDDD}.txt |
| |||||
Step 2 Gen uniGram, sorted |
| |||||
Step 3 Norm (n-1)-gram |
|
|
|
| ||
Step 4 Gen (n-1)-gram for threshold on NWC |
|
|
|
| ||
Step 5 Gen n-Gram |
|
|
|
|
| |
Step 6 Sort n-Gram | 33 min. | 45 min. | 40 min. | 33 min. |