Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Machine Learning in Python: Diabetes Prediction Using Machine Learning

Astha Baranwal, Bhagyashree R. Bagwe, Vanitha M

Source Title: Handbook of Research on Applications and Implementations of Machine Learning Techniques

DOI: 10.4018/978-1-5225-9902-9.ch008

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Diabetes is a disease of the modern world. The modern lifestyle has led to unhealthy eating habits causing type 2 diabetes. Machine learning has gained a lot of popularity in the recent days. It has applications in various fields and has proven to be increasingly effective in the medical field. The purpose of this chapter is to predict the diabetes outcome of a person based on other factors or attributes. Various machine learning algorithms like logistic regression (LR), tuned and not tuned random forest (RF), and multilayer perceptron (MLP) have been used as classifiers for diabetes prediction. This chapter also presents a comparative study of these algorithms based on various performance metrics like accuracy, sensitivity, specificity, and F1 score.

Chapter Preview

Top

Introduction

Diabetes is a disease which happens when the glucose level of the blood becomes high, which eventually leads to other health problems such as heart diseases, kidney disease etc. Several data mining projects have used algorithms to predict diabetes in a patient. Though, in most of these projects, nothing is mentioned about the dangers of diabetes in women post-pregnancies. While data mining has been successfully applied to various fields in human society, such as weather prognosis, market analysis, engineering diagnosis, and customer relationship management, the application in disease prediction and medical data analysis still has room for improvement in accuracy.

Machine learning relates closely to Artificial Intelligence (AI) and makes software applications predict outcomes through statistical analysis. The algorithms used allow for reaching an optimal accuracy rate in predicting the output from the input data. Machine learning follows similar processes used in data mining and predictive modeling. They recognize patterns through the data entered and then adjust the actions of the program accordingly.

Machine learning algorithms are categorized as supervised learning and unsupervised learning. Supervised learning requires input data and the desired output data to build a training model. The training model is built by a data analyst or a data scientist. A feedback is then furnished concerning the accuracy of the model and other performance metrics during algorithm training. Revising is done as needed. Once the training phase is completed, the model can predict outcomes for new data. Classification is one of the many data mining tasks. Classification comes under supervised learning which implies that the machine learns through examples in Classification. In classification, every instance from the dataset is classified into a target value. Classification can either be binary or multi-label. Sometimes, one particular instance can also have multiple classes known as multi-class classification. Classification algorithms are majorly used for prediction and come under the category of predictive learning.

Unsupervised learning is used to draw inferences from the input data which do not have any labeled responses. This data is not categorized, labeled or classified into classes. Clustering analysis, one of the most common unsupervised learning method, is used to find hidden patterns in data or to form groups based on the input data.

While machine learning models have been around for decades, they have gained a new momentum with the rise of AI. Deep learning models are now used in most of the advanced AI applications. If these models are implemented for medical uses, they could be revolutionary for the society. Diagnosis of diseases like diabetes would be easier than ever. Machine learning in medical diagnosis applications fall under three classes: Pathology, Oncology and Chatbots. Pathology deals with the diagnosis of diseases with the help of machine learning models created with the data of diagnostic measurements of the patients. Oncology uses deep learning models to determine cancerous tissues in patients. Chatbots designed using AI and machine learning techniques can identify patterns in the symptoms of the patients and suggest a potential diagnosis or it can recommend further courses of action. This chapter falls under the pathological uses of machine learning as the model created will give diagnosis of whether a patient is diabetic or not.

Key Terms in this Chapter

F1 Score: F1 score is a combination function of precision and recall. It is used when we need to seek a balance between precision and recall.

Accuracy: It is a metric used to predict the correctness of a machine learning model. The model is trained using the train data and a classifier is built. The test data is used to cross validate the classifier model. The percentage of correctly classified instances is termed as accuracy.

Area Under the Curve (AUC) Score: Area under the curve (AUC) is a binary classification metric. It considers all the possible thresholds. Different threshold values result in distinct true positive/false positive rates. As the threshold is decreased, more true positives (but also more false positives) instances are discovered.

Multilayer Perceptron: Multilayer perceptron falls under artificial neural networks (ANN). It is a feed forward network that consists of a minimum of three layers of nodes- an input layer, one or more hidden layers and an output layer. It uses a supervised learning technique, namely, back propagation for training. Its main advantage is that it has the ability to distinguish data that is not linearly separable.

Hyper-Parameter Tuning: The model architecture is defined by several parameters. These parameters are referred to as hyper parameters. The process of searching for an ideal model architecture for optimal accuracy score is referred to as hyper parameter tuning.

Sensitivity/Precision: It is the ratio of true positives to the sum of the true positive and false negative.

Specificity: It is the ratio of true negatives to the sum of the true negatives and false positives.

Supervised Learning: Machine learning is broadly classified into two: supervised learning and unsupervised learning. In supervised learning, the machine learns from examples. Historical or train data is needed which is given as an input to the machine and a classifier model is formed. A supervised algorithm also needs a target value. On the contrary, unsupervised learning algorithms need neither the train data nor the target value.

Logistic Regression: Logistic regression is a classification algorithm that comes under supervised learning and is used for predictive learning. Logistic regression is used to describe data. It works best for dichotomous (binary) classification.

Recall: Recall is the ratio of true positives to the sum of true positives and false negatives.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Machine Learning in Python: Diabetes Prediction Using Machine Learning

Abstract

Introduction

Key Terms in this Chapter

Complete Chapter List