Article Preview
Top1. Introduction
The histopathology is a branch of histology where biological diseased tissues or cells are examined under a microscope. Usually, the pathologist observes the stained biopsies with hematoxylin and eosin (H&E) for prognosis, grading, and cancer identification. Nevertheless, the diagnostic of biopsies is a complex task and requires years of experience, which results a high variance between the pathologist's diagnosis. Thus, to reduce this inter variability, computer-aided diagnostic systems (CAD) are employed as a second reader.
Previously, machine learning (ML) methods have been one of the most popular applications in CAD systems, their overall process depends on four main steps: the detection of regions of interest, feature extraction (GLCM (Vujasinovic et al., 2015), LBP (Hervé et al., 2011)), feature selection (RFA (Dif et al., 2019), MVO (Dif et al., 2018)) and classification (Komura et al., 2018). However, in histopathology, the extraction of handcrafted features is one of the greatest challenges because of the complex structure of cells and tissues, moreover, tumors can present distinct cytological features (Roberto et al., 2017). Recently, there has been a surge of interest in deep learning (DL) algorithms for medical image analysis. The benefit of these methods compared to the traditional ML algorithms is their capacity to learn data representation, where the relevant characteristics are extracted through the classification process.
For histopathological image analysis, biopsy slides are digitized as whole slide images (WSI) (Janowczyk et al., 2016) by whole slide digital scanners (WSD) (Al-Janabi et al., 2012). The high resolution of WSI made digital pathology a popular application for DL methods (Litjens et al., 2017) in different tasks: mitosis detection (Cireşan et al., 2013), gland segmentation (Kainz et al., 2015), lymphoma subtype classification (Janowczyk et al., 2016). On the other hand, the limited number of available medical images and the difficulty of their annotation are the leading causes of overfitting. In the literature, several attempts have been made to prevent this problem by improving the generalization based on various techniques: transfer learning (Ng et al., 2015) and regularization strategies (dropout (Srivastava et al., 2014) and ensemble learning (Ju et al., 2018)).
The purpose of this research is to improve the generalization capacity of convolutional neural networks for lymphoma subtypes classification. Our work takes advantage of various regularization methods in a deep learning framework: data augmentation, the exploitation of small models (MobileNet), the selection of the suitable optimizer, and the checkpoint ensemble model selection. This study provides the first comprehensive assessment of checkpoint ensembling in histopathological applications based on the optimized MobileNet architecture.
The remaining part of the paper proceeds as follows: section1 presents the related works to the automated methods for lymphoma subtypes classification and the ensemble learning methods in deep learning. Section 2 details the process of the used methods. Section3 explains the proposed framework. Section 4 illustrates and discusses the obtained results and the last part concludes this work.