Evaluating the Quality and Interoperability of Biomedical Terminologies

Technical Report to the LHNCBC Board of Scientific Counselors April 2018

Biomedical terminologies and ontologies are enabling resources for clinical decision support systems and
data integration systems for translational research and health analytics. Therefore, the quality of these resources
has a direct impact on healthcare and biomedical research. In the past decade, quality assurance
(QA) of biomedical terminologies has become a key issue in the development of standard terminologies
and has emerged as an active field of research. Approaches to quality assurance include the use of lexical,
structural and semantic techniques applied to biomedical terminologies, as well as techniques for comparing
and contrasting these resources.

As part of the Medical Ontology Research project, we have explored quality assurance and interoperability
issues in a variety of biomedical terminologies including drug terminologies, clinical terminologies,
and specialized terminologies, such as HPO – the Human Phenotype Ontology and the Orphanet terminology
for rare diseases. In this report, we review 32 investigations performed in our research group since
this project was last reviewed by the BSC in 2010. About half of these investigations have a primary focus
on quality assurance, for which we developed novel methods. In the other half, we applied existing
techniques to assess interoperability among terminologies or some aspect of quality (e.g., coverage) in a
terminology. In our work, we put special emphasis on the development of principled, automated, scalable
methods, applied systematically to the entire content of a terminology by independent researchers, as opposed
to manual review of subsets by domain experts.

The QA processes we developed have proved effective in identifying a limited number of errors that had
defeated the quality assurance mechanisms in place in terminology development systems. We have shared
our findings and techniques with the scientific community through scientific publications and presentations
at conferences. Whenever possible, we have also reported these issues to the developers of the biomedical
terminologies we investigated.

This work is also a contribution to the LHC Training Program, since 21 of the 32 studies listed in this report
(66%) have involved post-doctoral fellows or summer (graduate and undergraduate) students.

