These tools will no longer be maintained as of December 31, 2024. Archived webpage can be found here. Indexing Initiative Github repository under development. Contact NLM Customer Service if you have questions.

TOOLS

SemMedDB Database Download

This is the final update for the SemMedDB Database.

We have deleted all previous versions of the SemMedDB database (versions 31, 40, 41, and 42).

The databases downloadable here (from version 3.0 on) have been created using the new database schema (The schema and database information is available here).

The file names consist of five parts:

  1. database name (semmedVER43),
  2. year (2024),
  3. the letter "R", and
  4. table name (e.g., SENTENCE)
  5. some suffixes (sql, csv, etc.)
The letter R represents that the database was generated with standard SemRep options.

The new database schema differs from the previous one (versions 2X) in the following ways:

  1. We simplified the schema significantly by removing the CONCEPT, CONCEPT_SEMTYPE, PREDICATION_ARGUMENT, and SENTENCE_PREDICATION tables. The relevant contents of these tables can still be derived from PREDICATION if needed.
  2. A GENERIC_CONCEPT table has been added to the schema. This table contains generic concepts, as indicated by SemRep. The concepts that are not in this table are considered novel.

We no longer produce an annual release of the database of predications generated by SemRep using the sortal anaphora resolution. For sortal anaphora resolution in SemRep, see our BMC Bioinformatics paper.

The GENERIC_CONCEPT table has been updated in the June 30 2018 and all subsequent releases. Consequently, the SUBJECT_NOVELTY and OBJECT_NOVELTY columns of the PREDICATION table have been updated as follows: If the concept is not in the GENERIC_CONCEPT table, the value is set to 1; otherwise, it is set to 0.

Starting with version VER40, the PMID column in SENTENCE is dependent on the PMID in the CITATION table with a foreign key constraint. Therefore, all the PMIDs in the SENTENCE table have corresponding rows in the CITATION table, which has metadata PMID information.


Please note that all downloads in the tables below require a UMLS Terminology Services (UTS) account; to sign up for a UTS account, please click here.

Database name: semmedVER43_R (Processed using MEDLINE BASELINE 2022 + PubMed Update Files through May 8, 2024)

Semrep version: Regular semrep version 1.8
Number of citations processed: 37,233,341
Number of predications: 130,480,195
* This database was obtained from SemRep results with the anaphora feature turned off.

This is the final update for the SemMedDB Database.

The next table contains links to MySQL download files.

TABLE NAME Size # Rows Download link sha1sum md5sum
CITATIONS 177M   37,233,341   download download download
ENTITY 46G   1,982,424,585   download download download
GENERIC_CONCEPT 4.7K  259  download download download
PREDICATION 3.1G   130,480,195   download download download
PREDICATION_AUX 4.0G   130,480,181   download download download
SENTENCE 16G   263,160,520   download download download

 


The next table contains links to CSV files.

TABLE NAME Size # Rows Download link sha1sum md5sum
CITATIONS 177M   37,233,341   download download download
ENTITY 48G   1,982,424,585   download download download
GENERIC_CONCEPT 3.9K  259  download download download
PREDICATION 3.2G  130,480,195  download download download
PREDICATION_AUX 4.2G  130,480,181  download download download
SENTENCE 16G  263,160,520  download download download