Data Science for Industry 4.0

Data Science for Industry 4.0

Indraneel Dabhade (O Automation, India)
Copyright: © 2023 |Pages: 14
DOI: 10.4018/978-1-7998-9220-5.ch004
OnDemand PDF Download:
No Current Special Offers


Proliferation of smart devices coupled with advances in computing technologies has aided smart decision-making. Though the term data science is of recent origin, the field utilizes cornerstone computing and statistical concepts to solve data-specific challenges. The role of a data scientist is to extract maximum information from raw data. This role has spawned across disciplines but only acts as a supplemental measure to actual domain knowledge. The field of data science encompasses analytics, artificial intelligence, and statistical modeling. Information security has not always been integral to the data science ecosystem. An increasingly connected world with heightened reliance on data has encouraged data scientists to be well-versed with data security. This article takes an overview of the commercial application using data science, machine learning, and big data.
Chapter Preview

Learning Machines

Thinking is a direct product of consciousness levels. Dehaene et al. (2017) distinguish conscious computation into two crucial dimensions. They are C1: global availability of relevant information and C2: a cognitive system that self-monitors. The current state of machine learning algorithms encompasses both these dimensions. The most complex learning challenges lie in modeling systems replicative of human emotions. Fear is one such emotion, difficult to capture holistically with static rules. Machines capable of learning fear display better autonomous behavior (Hutson, 2019). The safest and most reliable systems are those that showcase zero errors.

One of the earliest contributors to experimental machine learning, Arthur Samuel, while devising procedures for a machine to play checkers, introduced the concepts of 'rote learning' and 'learning by generalization' (Samuel, 1959). Donald Michie expounded on this idea with the 'Parable of Self-Improvement,' which states '...rarely occurring problems will gravitate to the bottom and frequently occurring problems to the top' (Michie, 1968). Michie asserts:

  • that the apparatus of evaluation associated with any function shall comprise a “rule part” (computational procedure) and a “rote part” (lookup table);

  • that evaluation in the computer shall on each given occasion proceed whether by rule, or by rote, or by a blend of the two, solely as dictated by the expediency of the moment;

  • that the rule versus rote decisions shall be handled by the machine behind the scenes; and

  • that various kinds of interaction be permitted to occur between the rule part and the rote part.

Today, machine learning algorithms have developed to handle higher statistical complexity. Nonmonotonicity (i.e., rejection and reform of prior decisions) and the ability to work on high-dimensional data have added power to the learning models.


Big Computing

Jim Gray's seminal work on data processing in the '90s laid the foundation for high-speed access to data residing on nodes. In his 2007 lecture to the Computer Science and Technology Board of the National Research Council, he introduced the fourth research paradigm of 'Data-Intensive Science' to the existing three (positivism/postpositivism; interpretivism/constructivism; and critical theory), thus defining 'eScience' as the synthesis of technology and science. 'GrayWulf' set an example for other applications, including CERN's Large Hadron Collider (LHC), Virtual Observatory (VO) for astronomical data, and the National Center for Biotechnology Information (NCBI) in genomics (Szalay A. S., 2009).

Users, capitalizing on the interconnected nature of global information networks, push to scale computational processes and storage resources. Redundancy, fault tolerance, security, and speed are features that add trust and confidence to this adoption. A hurdle in running complex algorithms, including those of nonlinear optimizations, is a time-based constraint. Problems, including getting stuck at a local minimum of a solution landscape, handle suboptimal solutions. Improved cloud computing infrastructure reduces the computation expense for finding better solutions to everyday problems. To measure big data, Laney (2001) introduced the 3Vs.

Key Terms in this Chapter

Autonomous System: A self-learning system capable of using real time information to make goal-based decisions.

Nonce: A generated pseudo-random number that plays a critical role in blockchain technology.

Heteromation: Machines compensating for human effort.

Redlining: A community based discriminatory practice implemented by financial institutes often considered biased, derogatory, and illegal.

Byzantine Generals Problem: A problem faced by decentralized systems to propagate trust when system nodes go rogue.

Digital Footprint: A trail of data left by user using smart devices.

Quantum Annealing: Solving complex optimization problems using quantum fluctuations.

Complete Chapter List

Search this Book: