Testing Data: UMLS-Core
This list of terms was used in UMLS-Core projects. It is used as gold standard in this project for testing.
I. General Information
- UMLS-Core: SCTMap_withCUI_201302 (provided by Dr. K.W. Fung)
- In MS Excel Format:
Term Id | Local Term | SNOMED CID | SNOMED FSN | UMLS CUI
|
- Contains 15,487 terms with valid mapped CUI (used as gold standard)
- Contains 13,077 unique terms
- 1,492 terms are duplicated with different ID (sources)
- 35 terms have multiple CUIs (ambiguous)
II. Data Process
- Convert from Excel to CVS format
- Convert from CVS to pipe separate format (gov.nih.nlm.nls.stmt.Lib.FromCsvToPipeFile)
- Filter out duplicated terms to unify term|CUI
- For testing input: Retrieve fields 2
- For gold standard: Retrieve fields 2,5
III. Source of UMLS-Core data
- Problem list terminologies (local terms) from 6 (8) institutions
- HA: Hong Kong Hospital Authority
- IH: Intermountain Healthcare
- KP: Kaiser Permanente
- MA: Mayo Clinic
- NU: University of Nebraska Medical Center
- RI: Regenstrief Institute
- A problem list is a complete list of all patient's problem
- The data in the original paper:
- 76,237 terms and their usage frequenies in 14 million patients were submitted from six institutions
- 65,678 terms unique across instutions
- mapping from the local problem list terms to standard terminologies (ICD-9-CM, SNOMED CT) if available
- 14,395 terms covered 95% of usage in each institution (10,081 terms unique across institutions)
- 13,26 terms were successfully mapped to 6,776 UMLS concepts
- UMLS mapping - 2008AA: 10,812 (75%)
- exact match - case-insensitive: 8,102 (56%)
- normalized match: 2,035 (14%)
- synonym substitution: 576 (5%)
- local maps to standard terminilogies: 1,007 (7%)
- automatically map - if labeled as exact match
- manully reviewed for exact match - if not labeled as exact match
- manual mapping use RRF browser: 1,442 (10%)
- unmapped: 1,134 (8%)
- Highly specific: 53%
- Very general: 11%
- Administrative: 7%
- Laterality: 7%
- Negative finding: 3%
- Composit comcept: 3%
- Meaning unclear (ambiguous): 2%
- Miscellaneous: 13%
- References: UMLS-Core Project