Article Preview
Top1. Introduction
The bio-imaging techniques such as fluorescent microscopy, two-photon-laser canning microscopy, and electron microscopy have become essential tools for exploring the structures and functions of organisms. With the advancement in the automatic imaging technologies, there is a rapid increase of biomedical imaging data sets in recent years, ranging from X-ray, CT for disease diagnosis, to in situ hybridization (ISH) imaging for analyzing gene expression patterns. The number of bio-images is increasing on a scale comparable to that of the genomic revolution (Hamilton et al., 2006).
The huge amounts of biomedical image datasets present significant challenges for traditional analysis methods based on manual annotation and human labeling. Therefore, utilizing computing technologies in automatic image processing and analyses has become popular research topics. With the collaboration of biomedical scientists and computer scientists, the area of “bio-image” informatics (Peng, 2008), which is a new branch of bioinformatics, was developed and a huge amount of computer-based algorithms for managing, indexing, and analyzing bio-image data sets has been introduced. Examples of such applications include automatic cell detection systems (Long, Cleveland, & Yao, 2010; Huang, Sun, & Hu, 2009), bio-image segmentation systems (Bae, Pan, Wu, & Badea, 2009; Madhloom, Kareem, Ariffin, & Zaidan, 2010), cell phenotype classification systems (Minamikawa et al., 2003), etc. A good review of this area was given in (Peng, 2008).
Histology is an essential tool in biomedical research field to examine microscopical anatomy of cells and tissues of plants and animals in order to infer the functional semantics of organisms. In one assay of histology experiment, the tissues or cells are stained so that the structure could be examined by human experts for annotating structural characteristics. With the deluge of the histology data sets, computer-based histology image analysis, which saves human labors and decreases the inter-intra variance, is demanded for automatic management and analyses of histology data and databases.
One of the critical tasks in this area is to classify the raw histology images based on the phenotype. By classifying images into different categories, it not only helps the medical scientists to make comparison within and cross varieties but also facilitates the computer scientists to build efficient indexing and retrieval systems. As a matter of fact, the diagnosis procedure is a binary classification problem itself. However, the task is quite challenging in this scenario for three reasons. First, histology images are non-stationary images. In other words, each region of the raw image could have distinct characteristics. Second, the variance in operating conditions in the laboratory increases the effect of artifacts, which leads to the increase of the intra-class differences. Third, the inter-class difference is relatively small and it is difficult for human to distinguish. In comparison among performances for different benchmark data sets in (Huang, Sun, & Hu, 2009), it showed that the accuracies of three histology data sets, Liver Aging, Liver Gender, and Lymphoma were relatively lower than those of others, which demonstrated the challenge of classifying these data sets.