The Use of Natural Language Processing for Market Orientation on Rare Diseases

The Use of Natural Language Processing for Market Orientation on Rare Diseases

Matthias Hölscher, Rudiger Buchkremer
Copyright: © 2021 |Pages: 21
DOI: 10.4018/978-1-7998-4240-8.ch010
(Individual Chapters)
No Current Special Offers


Rare diseases in their entirety have a substantial impact on the healthcare market, as they affect a large number of patients worldwide. Governments provide financial support for diagnosis and treatment. Market orientation is crucial for any market participant to achieve business profitability. However, the market for rare diseases is opaque. The authors compare results from search engines and healthcare databases utilizing natural language processing. The approach starts with an information retrieval process, applying the MeSH thesaurus. The results are prioritized and visualized, using word clouds. In total, the chapter is about the examination of 30 rare diseases and about 500,000 search results in the databases Pubmed, FindZebra, and the search engine Google. The authors compare their results to the search for common diseases. The authors conclude that FindZebra and Google provide relatively good results for the evaluation of therapies and diagnoses. However, the quantity of the findings from professional databases such as Pubmed remains unsurpassed.
Chapter Preview


In this article, we describe different ways for a market player to get information about the rare disease healthcare market. As rare diseases are concerned, there is little information available, and it is challenging to integrate the topic into a comprehensive market strategy with information flows. Stakeholders are dependent on search engines and databases, and it is essential to know which is the most appropriate. A relatively small number of articles on rare diseases exists; however, the quantity is still a challenge, especially if several rare diseases are searched simultaneously. Natural Language Processing can provide support through finding relevant articles quickly, and by helping to find the “needle in the haystack” through mathematical operations and subsequent visualizations. These methods also represent an essential accomplishment for physicians, as they can usually only control a small part of known diseases. Furthermore, NLP is crucial for patients, relatives, and other stakeholders who are not familiar with the subject matter.

A low prevalence characterizes rare diseases, and the global number of incidences for each disease is low. The impression may arise that they do not impact the health care market. However, the situation is different: Collectively, about 10% of the US population is affected by a rare (also called orphan) disease as many countries are undertaking efforts to provide patients with orphan diseases with highly qualified medicines. In the USA, the Orphan Drug Act has been in place since 1983, and in China, efforts are being made to improve patient care (Kang et al., 2019). Rare diseases affect the medical market in at least two ways.

On the one hand, these drugs are commonly quite expensive because they become visible in the market. On the other hand, it generally takes several years until a rare disease is discovered at all so that unnecessary examinations and treatments cost enormous amounts of money (Svenstrup et al., 2015). Therefore, healthcare professionals need to quickly discover the right information on rare diseases, and Natural Language Processing (NLP) can provide useful help.

The problem with transparency in the rare diseases market is that there is little information about specific rare diseases, and it can be very laborious to find orientation in the market. Thus, how do we orientate ourselves in a market about which hardly any information is available? Moreover, how do we find our way through it?

The proclamation for a business that improves its market orientation also expands its market performance has been issued for more than 60 years. Narver and Slater (1990) introduce an efficient instrument the degree of market orientation in 1990 and show that customer and competitor orientation includes all activities to obtain information about buyers and competitors. It is essential to cover the target market and to disseminate the resulting reports internally. In 2005, Kirca et al. provide a quantitative summary of the bivariate findings regarding forerunners and the effects of performing market-orientation research and confirm that market intelligence is a compulsory prerequisite to participate in a market successfully. Intelligence seems feasible for a transparent and information-rich market, but information on rare diseases is scarce. For a large corporation or an organization in a transparent market, it is not challenging to establish a sophisticated marketing strategy. However, if management attention is scarce to investigate an opaque market such as the marketing of rare diseases, the trade-off for a strategy is the in-situ information search in the Web and literature databases (see also Christen et al., 2009). Thus, we need to look for ways to utilize search engines to get a quick overview of the market. Besides, it is crucial that we also evaluate the results. It is, therefore, advisable to compare the results with professional databases. FindZebra is a database that focuses on rare diseases, and MEDLINE or PubMed is a database of medical articles.

Key Terms in this Chapter

Orphan/Rare Disease: An orphan or rare disease is affecting only very few people compared to the overall population.

Word Cloud: It represents a plotted distribution of the number of words in a given text. Words that occur more frequently appear more significant in the visualization.

Multi-Omics/Systems Medicine: An approach that considers diseases not only as a phenotypic (organic) phenomenon. It also takes into account, for example, bacteria and viruses (microbiome), chemical reactions (metabolomics), genes (genomics), and even the (social) environment.

Market Orientation: To achieve market orientation, it is essential to know the elements of the market and their patterns of movement. The more is known about a market, the better is the resulting orientation.

PubMed/Medline: “Medline” denotes a medical database offered through the “PubMed” portal by the U.S. National Library of Medicine. To search for further medical information, it is recommended to use the databases “Embase” and “Chemical Abstracts.”

Taxonomy/Ontology: A comprehensive semantic search in texts requires the classification of terms in a subject-specific context. This approach is sometimes also called lexicon-based. The purest form of a lexicon in this respect is the thesaurus. If these terms are arranged in a hierarchical structure, it is often called taxonomy, and if additional relations are given, ontology. MeSH (medical subject headings) stands for a well-known and comprehensive medical taxonomy.

Information Retrieval: A set of procedures, which mainly deals with the search in text files. The procedures are often described in the context of library work.

Orphan Drug Act: The Act stands for a law legislated in the U.S.A. in 1983 to support R&D of medications for rare diseases. Reduced sales expectations for drugs due to low incidences can be partly compensated by government support.

Complete Chapter List

Search this Book: