The MEDLINE n-gram set is used to retrieve multiwords for building the SPECIALIST lexicon. Lexical Systems Group (LSG) would like to share this n-gram set (n = 1 ~ 5) with NLP|MLP community. Please download from the following links.
Year | Document Count | Sentence Count | Word Count | N-grams | Distilled N-grams | DNg/Ng % | Download |
---|---|---|---|---|---|---|---|
2024 | 36,555,430 | 253,923,392 | 5,326,576,788 | 34,160,908 | 13,775,979 | 40.33% | The MEDLINE n-gram set 2024 |
2023 | 34,960,700 | 238,939,832 | 5,001,000,732 | 32,107,061 | 12,779,900 | 39.80% | The MEDLINE n-gram set 2023 |
2022 | 33,405,863 | 224,228,682 | 4,680,725,429 | 30,090,771 | 11,949,720 | 39.71% | The MEDLINE n-gram set 2022 |
2021 | 31,850,051 | 209,685,517 | 4,365,354,060 | 28,103,252 | 11,127,802 | 39.60% | The MEDLINE n-gram set 2021 |
2020 | 30,420,660 | 196,566,513 | 4,080,670,967 | 26,310,808 | 10,354,021 | 39.35% | The MEDLINE n-gram set 2020 |
2019 | 29,138,919 | 185,619,887 | 3,824,268,997 | 24,666,816 | 9,595,606 | 38.90% | The MEDLINE n-gram set 2019 |
2018 | 27,837,540 | 174,395,209 | 3,585,789,820 | 23,171,133 | 8,979,895 | 38.75% | The MEDLINE n-gram set 2018 |
2017 | 26,759,399 | 163,021,640 | 3,386,661,350 | 21,963,037 | 8,461,972 | 38.53% | The MEDLINE n-gram set 2017 |
2016 | 24,358,442 | 143,471,776 | 2,971,013,236 | 19,325,338 | 7,402,848 | 38.31% | The MEDLINE n-gram set 2016 |
2015 | 23,343,329 | 134,834,507 | 2,786,085,158 | 18,148,692 | 6,793,561 | 37.43% | The MEDLINE n-gram set 2015 |
2014 | 22,356,869 | 126,612,705 | 2,610,209,406 | 17,023,819 | 6,351,392 | 37.31% | The MEDLINE n-gram set 2014 |
References: