De-identifying Medical Records with MIST (MITRE Identification Scrubber Toolkit)

Date: October 13, 2011 Time: (All day)
Event Type: Lecture

De-identification of the narrative portions of medical records has taken on increasing importance to enable researchers, particularly in translational medicine, to mine the rich data in the narrative portions of medical records while preventing disclosure of personal health identifiers (PHI). This talk describes MIST, MITRE's machine-learning based de-identification toolkit. The MIST toolkit makes possible the rapid tailoring of automated de-identification to particular document types and supports the transition of the de-identification software to medical end users, avoiding the need for developers to have access to original medical records. MIST contains an annotation tool for preparing gold-standard training data, a machine learning engine for building models from gold standard data, a runtime engine for finding PHI, and a replacement engine to transform or redact PHI. MIST has been applied to multiple kinds of clinical data, achieving excellent performance on the 2007 i2b2 de-identification evaluation challenge; MIST has also been applied to patient records at Vanderbilt, University of Michigan, and the VA Boston. We will report on our results including our experiences with rapid retargeting of MIST to different types of clinical records. We conclude with a discussion of new metrics for assessing the output of de-identification systems. The MIST software is available from the MITRE Corporation under an open-source license.