Teeth and Landmarks Detection and Classification Based on Deep Neural Networks

Teeth and Landmarks Detection and Classification Based on Deep Neural Networks

Lyudmila N. Tuzova (Denti.AI, Russia), Dmitry V. Tuzoff (Steklov Institute of Mathematics in St. Petersburg, Russia), Sergey I. Nikolenko (Steklov Institute of Mathematics in St. Petersburg, Russia) and Alexey S. Krasnov (Dmitry Rogachev National Research Center of Pediatric Hematology, Oncology and Immunology, Russia)
Copyright: © 2019 |Pages: 22
DOI: 10.4018/978-1-5225-6243-6.ch006


In the recent decade, deep neural networks have enjoyed rapid development in various domains, including medicine. Convolutional neural networks (CNNs), deep neural network structures commonly used for image interpretation, brought the breakthrough in computer vision and became state-of-the-art techniques for various image recognition tasks, such as image classification, object detection, and semantic segmentation. In this chapter, the authors provide an overview of deep learning algorithms and review available literature for dental image analysis with methods based on CNNs. The present study is focused on the problems of landmarks and teeth detection and classification, as these tasks comprise an essential part of dental image interpretation both in clinical dentistry and in human identification systems based on the dental biometrical information.
Chapter Preview


Over the last decade, computer vision models and algorithms based on deep learning models have been successfully applied to various health and medicine domains in a number of medical imaging tasks such as detection and staging of cancer, lung segmentation, diagnosis of colitis, detection and classification of brain diseases, and many others (Lee, et al., 2017; Rezaei, Yang, & Meinel, 2017; Liu, et al., 2017; Litjens, et al., 2017; Shen, Wu, & Suk, 2017). In dentistry, several works have applied deep learning models and algorithms for dental radiograph analysis (Miki, et al., 2016; Lee, Park, & Kim, 2017; Wang, et al., 2016; Ö. Arik, Ibragimov, & Xing, 2017; Tuzoff, et al., 2018). However, deep learning in dentistry still remains an underdeveloped area of research, even though deep neural networks provide state-of-the-art results in many kinds of image recognition tasks (Lee, et al., 2017; Litjens, et al., 2017; LeCun, Bengio, & Hinton, 2015) and problems such as teeth and landmark detection appear to be straightforward object detection problems that could be amenable to modern computer vision approaches based on deep learning methods.

In this chapter, the relevant literature on deep learning methods, specifically convolutional neural networks (CNNs), applied for the tasks of teeth and landmarks detection and classification by type or tooth number is reviewed. These tasks comprise an important part of dental X-ray image analysis. The results can be used to automatically fill a patient's dental records for medical history and treatment planning, preprocess an image for further pathology detection, improve speed and accuracy of postmortem human identification, perform anatomical measurements, and other problems. The deep learning methods have previously been studied for pathology detection purposes (Wang, et al., 2016; Oliveira & Proença, 2011; Imangaliyev, et al., 2016) as well; however, a review of computer-aided disease diagnostics is out of scope of the present study.

CNNs represent state-of-the-art deep learning architectures commonly applied for image recognition tasks. Modern object classification, detection and segmentation approaches based on CNNs have shown promising results, often outperforming methods based on traditional computer vision or other machine learning techniques. An important advantage of deep learning techniques compared with traditional computer vision and other machine learning approaches is that deep learning algorithms do not rely on handcrafted feature extraction and can achieve high performance working with raw input such as pixel values for X-Ray image sources. These methods allow interpreting medical images even if the images are noisy, taken with a different equipment, or in a different setting than that of the data used for training the model. Despite the increasing popularity of the CNNs, the challenges of the application of such architectures still exist. One of the most significant limitation is the amount of annotated data required for the effective model training.

A number of other deep learning models have previously been studied for medical image analysis, including stacked auto encoders (SAEs) and deep belief networks (DBNs) (Shen, Wu, & Suk, 2017; Litjens, et al., 2017). However, CNNs currently represent the state-of-the-art architectures for image recognition tasks. In (Shen, Wu, & Suk, 2017) high performance of CNNs for the images interpretation is explained by the specific properties of the CNNs architectures that better utilize spatial information of images, when most of the other deep learning models process the input in the one-dimensional vector form. Moreover, training of the models, such as DBNs and SAEs, is a complex task combining unsupervised pre-training phase followed by the fine-tuning supervised step. A number of prior studies demonstrated that CNNs models outperformed other deep learning techniques for the image interpretation tasks, when there is annotated data available and end-to-end supervised learning can be performed (Wu, 2015; Song, Zhao, Luo, & Dou, 2017).

Key Terms in this Chapter

Image Classification: A classical computer vision problem where the task is to label an image with the particular class within a known set of possible classes.

Loss Function: A function used in supervised learning to measure the difference between the prediction and the ground truth.

Feed-Forward Neural Network: The artificial neural network wherein the input data is processed in one direction without any cycle possible. The network gets its input and transforms it with multiple layers of neurons to produce the final result depending on the specific task (e.g., class scores for classification problem or real values for regression problem).

Semantic Segmentation: A computer vision problem where the task is to identify various objects on a single image on a pixel-level basis.

Backpropagation: The algorithm used in in artificial neural networks to calculate the gradient, the vector of partial derivatives, for further update of model parameters. The algorithm is based on the chain rule of derivation.

Stochastic Gradient Descent (SGD): The algorithm that aims to minimize the loss function for the supervised learning algorithms, including neural networks. It calculates the gradient, e.g. using backpropagation, and then changes the parameters of the models in the negative gradient direction. It works iteratively and typically uses a mini-batch of training samples at a moment.

Object Detection: A computer vision problem that aims to locate a varying number of objects of different classes on a single image.

Convolutional Neural Network (ConvNet or CNN): A special type of feed-forward neural network optimized for image data processing. The key features of CNN architecture include sharing weights, using pooling layers, implementing deep structures with multiple hidden layers.

Complete Chapter List

Search this Book: