Pre-Process: Ui-Ti-Ab-Words
- Description:
This file includes all words in title and abstract in the training set (MEDLINE). Also, words are tokenized and filtered out by the
rules and
algorithm.
- Input files:
- Java Files & Algorithm:
- Read in title form uiTiWords.${NUM}.txt
- Read in abstract form uiAbWords.${NUM}.txt
- Combine uiTiWords.${NUM}.txt and uiAbWords.${NUM}.txt by PMID
- Print to uiTiAbWords.${NUM}.txt
- Output:
- TIAB/uiTiAbWords.${NUM}.txt, used to generate Wc and Dc
PMID | Words from title and abstract
|
---|