A. Noun Phrase Extraction:
Text is parsed into noun phrases using Phrasex. The design is modular, so that we can replace the Phrasex with other noun phrase extractors in future development.
B. Layered Searching Strategy:
1. The noun phrases are used as query string to Inquery database, (see description in C below) the output of a query is a ranked list of MeSH headings.
2. The layered search method searches the first field, if no record is retrieved, then search the next field, and so on.
3. The fields for searching are listed in sequence as follows:
a. TITLE
b. Synonyms
c. UMLS Related Concepts
d. UMLS Co-occurring Concepts
e. PubMed Citations
4. The ranking scores returned by the Inquery database are used as a part of the computation for Mapping Score. (Currently we are not doing any semantic aggregation of the ranked MeSH headings, like we did with our earlier standalone experiment. We believe that such aggregation will improve performance)
C. Inquery Database:
1. MeSH Main Headings as Titles for the records.
2. Each record includes the following fields:
TITLE: MeSH Main Heading
CUI: Concept Unique Identifier
SYN: Entry terms from MeSH, plus synonyms from UMLS
STY: UMLS Semantic Type
MN: MeSH Tree Number
REL: UMLS Related Concepts, including broader, narrower, and other related concepts.
COT: UMLS Co-occurring Concepts, top 50 terms are taken
PMCIT: 10 top PubMed citations with title, abstract, and MeSH headings, from query using the MeSH Heading as Major Topic.