Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.
The MEDLINE n-gram set is used to retrieve multiwords for building the SPECIALIST lexicon. Lexical Systems Group (LSG) would like to share this n-gram set (n = 1 ~ 5) with NLP|MLP community. Please download from the following links.
Year | Document Count | Sentence Count | Word Count | N-grams | Distilled N-grams | DNg/Ng % | Download |
---|---|---|---|---|---|---|---|
2025 | 38,201,553 | 270,098,242 | 5,676,864,905 | 36,370,468 | 14,722,972 | 40.48% | The MEDLINE n-gram set 2025 |
2024 | 36,555,430 | 253,923,392 | 5,326,576,788 | 34,160,908 | 13,775,979 | 40.33% | The MEDLINE n-gram set 2024 |
2023 | 34,960,700 | 238,939,832 | 5,001,000,732 | 32,107,061 | 12,779,900 | 39.80% | The MEDLINE n-gram set 2023 |
2022 | 33,405,863 | 224,228,682 | 4,680,725,429 | 30,090,771 | 11,949,720 | 39.71% | The MEDLINE n-gram set 2022 |
2021 | 31,850,051 | 209,685,517 | 4,365,354,060 | 28,103,252 | 11,127,802 | 39.60% | The MEDLINE n-gram set 2021 |
2020 | 30,420,660 | 196,566,513 | 4,080,670,967 | 26,310,808 | 10,354,021 | 39.35% | The MEDLINE n-gram set 2020 |
2019 | 29,138,919 | 185,619,887 | 3,824,268,997 | 24,666,816 | 9,595,606 | 38.90% | The MEDLINE n-gram set 2019 |
2018 | 27,837,540 | 174,395,209 | 3,585,789,820 | 23,171,133 | 8,979,895 | 38.75% | The MEDLINE n-gram set 2018 |
2017 | 26,759,399 | 163,021,640 | 3,386,661,350 | 21,963,037 | 8,461,972 | 38.53% | The MEDLINE n-gram set 2017 |
2016 | 24,358,442 | 143,471,776 | 2,971,013,236 | 19,325,338 | 7,402,848 | 38.31% | The MEDLINE n-gram set 2016 |
2015 | 23,343,329 | 134,834,507 | 2,786,085,158 | 18,148,692 | 6,793,561 | 37.43% | The MEDLINE n-gram set 2015 |
2014 | 22,356,869 | 126,612,705 | 2,610,209,406 | 17,023,819 | 6,351,392 | 37.31% | The MEDLINE n-gram set 2014 |
References: