Machine Learning in the Real World

Machine Learning in the Real World

Stylianos Kampakis
Copyright: © 2023 |Pages: 16
DOI: 10.4018/978-1-7998-9220-5.ch104
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Most data scientists and machine learning practitioners focus on algorithm development and implementation. However, the proper and successful application of data science in an organisation cannot be separated from business objectives and organisational dynamics. This way of thinking, however, can feel foreign to many data scientists who focus mostly on technical details. The goal of this article is to outline some of the considerations that a data scientist needs to take into account when implementing data science within an organisation. More specifically, this article discusses the topics of data strategy, data science processes, and some recent developments like MLOps.
Chapter Preview
Top

Introduction

Data science cannot be disentangled from organizational demands. A famous graph (Figure 1) produced by Brendan Tierney (2012) demonstrates how data science is integrated within an organization.

Figure 1.

A concise summary of what data science is (Tierney, 2012)

978-1-7998-9220-5.ch104.f01

This well-known diagram shows that data science is surrounded by a cycle of “soft skills” and business-related terms like “business strategy.” What this diagram demonstrates is that data science is an applied discipline. Data science aims to bring impact within an organization.

When machine learning is seen in isolation, it can be treated as an academic exercise, and it is acceptable for the researcher to deal with questions such as convergence bounds and metrics like categorical cross-entropy. On the other hand, when machine learning is used in practice, it is called data science, and research questions are essential only when converted into revenue. Metrics like the RMSE are replaced with KPIs, and the end-to-end process needs to translate into tangible impact for an enterprise.

The successful application of data science requires a set of steps within the organization.

Data science and machine learning are often taught in isolation as if the successful practice of those disciplines simply is about implementing algorithms. However, experience has demonstrated that the successful implementation of data science requires covering other aspects, like data strategy.

This chapter aims to assist data scientists and machine learning practitioners in engaging with the larger picture.

Top

Background

The first step is the successful design and execution of a data strategy. Data strategy refers to the long-term plan and goals of a business with regard to its data. Data strategy encapsulates business strategy, data governance, and data science and requires the participation of business leaders and technical experts.

The second step is the successful choice and implementation of a data science process. A data science process represents a structured approach to data science and machine learning projects. For example, methodologies like AGILE (Dingsøyr et al., 2012) and SCRUM (Hossain et al., 2011) have helped structure and define the work of software developers. Similar methodologies have appeared in the last few years in data science, whose requirements are different from the requirements of traditional software development.

The third step is to consider recent developments in machine learning and AI, like MLOps, which streamline machine learning training, deployment, and execution. MLOps stands for machine learning operations and is an acronym borrowed from Devops. Devops is defined on the official website of Amazon AWS (one of the pioneers of this field) as:

DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes. This speed enables organizations to better serve their customers and compete more effectively in the market. (Amazon Web Services, n.d.)

In a similar spirit, MLOps combined a set of philosophies and practices to streamline the development, testing, and deployment of machine learning algorithms.

Top

Data Strategy: An Overview

Companies must be able to manage vast quantities of data to succeed in the current business environment. Unfortunately, despite many companies instituting data-management and chief data officer roles, they are still playing catch-up.

Key Terms in this Chapter

Business Strategy: Business strategy is a term used to describe the process of defining the direction and scope of a business. It is also the process of developing and implementing the plans for achieving these goals.

MLOps: MLOps (shortcut for machine learning operations) refers to the automation and streamlining of machine learning pipelines within a deployment setting.

Data Science Lifecycle: The data science lifecycle is a process that starts with the collection of raw data, followed by cleaning and processing, then modeling and prediction, and finally deployment.

Data Strategy: The design of processes for data collection and manipulation with the objective to add value to a business objective.

Data Science Process: A general guideline which advises data science teams and stakeholders on how to design and execute data science projects.

Complete Chapter List

Search this Book:
Reset