You are here
Analysis of Consumer Health Questions for Development of Question-Answering Technology.
Objectives: To develop a computer system to answer consumer health questions by applying modern techniques in natural language processing and information retrieval. As a first-line reference system, the application will partially automate responses to users by searching and retrieving relevant documents from reliable, freely available consumer health information resources that are regularly reviewed and updated by library staff. The purpose of this project was to analyze the types of questions consumers submit to the National Library of Medicine (NLM), determine if the answers to the questions could be found in NLM resources, and create a taxonomy and annotation guidelines for consumer health questions for a machine-learning task.
Methods: A sample set of over 11,000 reference questions received by NLM customer service was examined, and a subset of questions that could potentially be answered automatically was identified. We aligned questions with potential answer sources. Questions related to genetic conditions were initially determined to be the best candidates for automatic answering. A taxonomy of question types and indicators was created in a iterative process.
Results: A taxonomy and annotation guidelines for consumer health questions containing named diseases were created. The final schema notes the general type of question, as well as extraneous information and relevant misspellings. It annotates distinct entities, such as medical problems and genes, as well as words that indicate a particular clinical concept question type, such as prevention, symptoms, prognosis, or treatment. The guidelines were successfully used to annotate a set of twenty questions with good inter-annotator agreement among four annotators.
Conclusions: Consumer health questions are challenging to answer automatically because the questions may be complex, be vague, or contain misspellings, and it is difficult to understand the motivation for the question. The taxonomy and guidelines created in this project can be used for a machine learning task on questions containing named diseases and, with some modifications, may accommodate additional question types.