Clinicians, researchers and members of the general public are increasingly using information technology to cope with the explosion in biomedical knowledge. This chapter describes the purpose of query log analysis in the biomedical domain as well as features of the biomedical domain such as controlled vocabularies (ontologies) and existing infrastructure useful for query log analysis. We focus specifically on MEDLINE, which is the most comprehensive bibliographic database of the world’s biomedical literature, the PubMed interface to MEDLINE, the Medical Subject Headings vocabulary and the Unified Medical Language System. However, the approaches discussed here can also be applied to other query logs. We conclude with a look toward the future of biomedical query log analysis.
The US National Library of Medicine (NLM) of the National Institutes of Health (NIH) developed and maintains many critical resources including databases, knowledge sources and software tools intended to allow access to biomedical information. The NLM “collects materials and provides information and research services in all areas of biomedicine and healthcare” (“About the National Library of Medicine,” 2007). When working with query logs in the biomedical domain, we make extensive use of NLM resources including MEDLINE, a variety of services via the Unified Medical Language System and PubMed, a search interface onto the biomedical literature indexed in MEDLINE.
Key Terms in this Chapter
UMLS: Unified Medical Language System.
Terminology: A set of terms (de Keizer, Abu-Hanna, & Zwetsloot-Schonk, 2000).
Navigational Query: Query intended to locate a particular article or group of articles, as opposed to satisfying a general information need (informational query) (Broder, 2002).
Consumer (of healthcare): A member of the lay public, as opposed to a researcher or clinician. Therefore, a consumer is not an expert in biomedical science or terminology.
PubMed: A freely-available interface onto MEDLINE created and maintained by the NLM.
Biomedicine: The broad domain of biology and healthcare including research and practice related to living organisms often focused on, but not limited to, human health and disease.
Term: Linguistic label for concepts (de Keizer, Abu-Hanna, & Zwetsloot-Schonk, 2000).
Classific ation or Taxon omy: A terminology where terms are arranged by “is_a” or “is_member_of” relationships into classes (de Keizer, Abu-Hanna, & Zwetsloot-Schonk, 2000).
Semantic: Of or relating to meaning in language (http://www.merriam-webster.com/dictionary/semantic accessed September 18, 2007).
Informational Query: Query intended to satisfy a general information need, as opposed to an attempt to locate a specific article or group of articles (navigational query) (Broder, 2002).
MeSH: Medical Subject Headings.
MEDLINE: A database of biomedical literature created and maintained by the US National Library of Medicine (NLM, a unit of the National Institutes of Health). MEDLINE is a bibliographic database, meaning that it contains the reference information needed to find articles, but not the actual full-text articles.
Concept: A cognitive construct based on entities in the real world such as “nose” or “anatomy” (de Keizer, Abu-Hanna, & Zwetsloot-Schonk, 2000).