Because of a lapse in government funding, the information on this website may not be up to date, transactions submitted via the website may not be processed, and the agency may not be able to respond to inquiries until appropriations are enacted. The NIH Clinical Center (the research hospital of NIH) is open. For more details about its operating status, please visit cc.nih.gov. Updates regarding government operating status and resumption of normal operations can be found at OPM.gov.

MEDLINE N-Gram Set

Lexicon - The MEDLINE N-Gram Set

The SPECIALIST Lexicon project icon

The MEDLINE n-gram set is used to retrieve multiwords for building the SPECIALIST lexicon. Lexical Systems Group (LSG) would like to share this n-gram set (n = 1 ~ 5) with NLP|MLP community. Please download from the following links.

YearDocument CountSentence CountWord CountN-gramsDistilled N-gramsDNg/Ng %Download
202538,201,553270,098,2425,676,864,90536,370,46814,722,97240.48%The MEDLINE n-gram set 2025
202436,555,430253,923,3925,326,576,78834,160,90813,775,97940.33%The MEDLINE n-gram set 2024
202334,960,700238,939,8325,001,000,73232,107,06112,779,90039.80%The MEDLINE n-gram set 2023
202233,405,863224,228,6824,680,725,42930,090,77111,949,72039.71%The MEDLINE n-gram set 2022
202131,850,051209,685,5174,365,354,06028,103,25211,127,80239.60%The MEDLINE n-gram set 2021
202030,420,660196,566,5134,080,670,96726,310,80810,354,02139.35%The MEDLINE n-gram set 2020
201929,138,919185,619,8873,824,268,99724,666,8169,595,60638.90%The MEDLINE n-gram set 2019
201827,837,540174,395,2093,585,789,82023,171,1338,979,89538.75%The MEDLINE n-gram set 2018
201726,759,399163,021,6403,386,661,35021,963,0378,461,97238.53%The MEDLINE n-gram set 2017
201624,358,442143,471,7762,971,013,23619,325,3387,402,84838.31%The MEDLINE n-gram set 2016
201523,343,329134,834,5072,786,085,15818,148,6926,793,56137.43%The MEDLINE n-gram set 2015
201422,356,869126,612,7052,610,209,40617,023,8196,351,39237.31%The MEDLINE n-gram set 2014

References: