The field of histopathology has encountered a key transition point with the progressive move towards use of digital slides and automated image analysis approaches. This chapter discusses the various methods and techniques involved in the automation of image analysis in histopathology. Important concepts and techniques are explained in the 5 main areas of workflow within image analysis in histopathology: data acquisition, the digital image, image pre-processing, segmentation, and machine learning. Furthermore, examples of the application of these concepts and techniques in histopathological research are then given.
There are a number of crucial steps that take place before a tissue sample can be digitally imaged and subsequently analysed. Artifacts, i.e. anything that interferes with the examination of the tissue, can be introduced during each of these steps that can greatly reduce the quality of the image and, therefore, the accuracy of the image analysis performed later.
The tissue sample is first removed during surgery, biopsy or autopsy and placed in a fixative, typically formalin, to prevent decay. It is then dehydrated by submerging it in ethanol. The sample is then permeated with paraffin and encased in a paraffin block. If the processes above are not carried out correctly, the paraffin-embedded tissue can become brittle and difficult to work with, which can lead to degraded image quality.
The paraffin-embedded tissue is now sectioned into slices of between 2 and 8 µm and placed on a glass slide. Many problems found in the images produced later are related to this process. Often the section can warp or fold as it is being cut. Parts of the tissue can also break away from the section. To reduce the warping and folding affects, the tissue section is floated on warm water after it has been cut in a process known as mounting. The thickness of the section dramatically affects the final image and must be considered carefully. If the section is too thick, it will be difficult to automatically identify and process individual cells due to overlaps. On the other hand, if the section is too thin then many of the cells will be missing their nuclei, leading to inaccuracies in automated analysis algorithms which rely on nucleus detection.
Key Terms in this Chapter
Data Acquisition: The group of techniques performed in order to achieve a digital image of a tissue sample, suitable for analysis.
RGB: Red, Green, Blue colour space. A representation of colour using the three primary colours, red, green and blue. All other colours are formed by different combinations of these three.
Machine Learning: The automation of the classification of different regions in the image as objects.
PCA: Principal Component Analysis. A dimension reduction technique used to transform a feature set into a new feature set whose features map to the variance in the system. The new features that provide the least amount of variance can subsequently be removed.
Histopathology: The study of disease by microscopically analysing tissue samples.
KNN: K - Nearest Neighbour. A classifier that decides on the class of an unlabeled sample based on its k nearest labeled neighbouring samples according to some distance measure (usually Euclidian).
LDA: Linear Discriminant Analysis. A dimension reduction technique used to transform a feature set into a smaller set of features that best discriminates between the different classes in the data.
H&E: Haematoxylin and Eosin. A staining technique used in Histopathology that pigments nucleic structures blue and cytoplasmic structures pink.
Segmentation: The process of building regions within an image which better represent the real world objects present in the image.
HSV: Hue, Saturation, Value colour space. A representation of colour that better represents human perception of differences in colour. The intensity (value) is separated the two terms that we use to define colour, hue (perceived colour) and saturation (light or dark)
Automated Image Analysis: Encompasses the automation of every step in the work flow of image analysis in histopathology, from the production of the image, to the high level understanding of the different objects present in the digital image produced.
SVD: Singular Value Decomposition. The series of matrix operations carried out in order to perform PCA on a set of features.
Image Pre-Processing: The group of techniques carried out globally on an image in order to allow more accurate analysis of the image.
FFNN: Feed Forward Neural Network. A classifier made up of a cascading ANN.
TMA: Tissue Microarray. An array of tissue samples from different sources, collected on one paraffin block.
SVM: Support Vector Machine. A classifier that builds an optimum decision boundary between classes based on a subset of labeled samples closest to the boundary. These samples are known as support vectors.
CMYK: Cyan, Magenta, Yellow, Key colour space. A representation of colour using the three secondary colours, cyan, magenta and yellow. All other colours are formed by subtracting different combinations of these three from white.