Cross-Language Information Retrieval

LHNCBC is no longer conducting active research on this project. Information is presented here for historical purposes.

From 2002 to 2006, this project focused on expanding user access to its biomedical information resources (such as by supporting languages other than English, e.g., Spanish. An extensive source of biomedical knowledge developed and maintained by NLM is the Unified Medical Language System (UMLS). Our approach for adapting the UMLS for multilingual applications, especially information retrieval, was mainly applied to, so that Spanish queries would retrieve relevant trials from the repository. Different Spanish-language prototypes for the clinical trials had also been developed in house, and these prototypes were also presented in various conference papers.

The three main components of our cross-language information retrieval approach consisted of:

  • expanding the UMLS by adding relevant entries in other languages
  • combining multiple linguistic, machine-translation, and statistical approaches to facilitate information retrieval
  • adapting the UMLS lexical tools (e.g., LVG) for other languages
