Article Preview
Top1. Introduction
Visual analytics have been shown to be effective for data exploration (Keim, 2002), but the requirement of computational power is high for global comparisons of disease trends. Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time (Mell & Grance, 2011). The elasticity of cloud computing meets the demand of high computational power for data exploration. Therefore, cloud computing is one of major enablers for explorations of data. This is a general shift of computer processing, storage, and software delivery away from the traditional desktop computers and local servers towards the cloud (Saranya & Sunitha, 2012).
Cancer is one of the leading causes of morbidity and mortality worldwide, with approximately 14 million new cases and 8.2 million cancer related deaths in 2012. It is expected that annual cancer cases will rise to 22 million within the next 2 decades. The prevalence of cancer varies by gender, age, ethnicity, geographical location, economic status, and so on. Generally, the cancer causes of death were common on breast, lung, liver, stomach, colon and rectum (International Agency for Research on Cancer World Health Organization, 2014). Although the age-standardized incidence rates on some cancer showed stable trends, but the prevalence of cancer has grown along with the ageing population. To better understand the progression of cancer, cancer registries were set up in different countries. The first population-based cancer registry was in Germany Hamburg in 1926 (Wagner, 1991). Cancer registries have been widely used in epidemiological research, so the World Health Organization (WHO) formed an International Agency for Research on Cancer (IARC) to collect cancer registry data across different countries. Descriptive studies use the registry database to examine differences in the incidence of cancer for different patient characteristics (Parkin, 2006). The data volume of the global cancer incidence is huge, so visual analytics can help to enhance the data interpretation on disease distribution and trends. It simplifies complex data to intuitive and interactive visual representation for faster and better understanding of the meaning of the data.
The main contribution in this study is to use a visual analytics platform (IBM Watson Analytics) to aid in visualizing differences in cancer trends and patterns embedded within data sourced from WHO cancer registries. The research aims to answer several major questions presented to us from cancer researchers, including, but not limited to (i) What are the top-ranking forms of cancers (ii) across different regions, (iii) across gender, (iv) over the years, (v) across high- and mid-income regions, and (vi) across different age groups? In the following session, we provide a visualization of the WHO data that enables us to intuitively answer these questions. In the future, such intuition can guide us in further explorations. For example, we can use the patterns to show the effectiveness of colorectal cancer (CRC) screening programme in US, and compare the results with some regions that do not have screening programme.
We selected the Choropleth map and the traditional line charts for demonstration. The Choropleth map used to present the population density in different geographical regions (MacEachren, Brewer, & Pickle, 1998). In this study, the regions with high volume of cancer incidence will appear with darker colors on the map. The line chart is the traditional way of presentation which shows data trends along timeline. In this study, line charts are used to demonstrate cancer trends across regions and different population groups, such as for different gender. A matrix of line charts is also developed for cross-sectional comparison between different age groups across the regions.