A Blockchain-Based Federated Learning: Concepts and Applications

A Blockchain-Based Federated Learning: Concepts and Applications

Ankit Khushal Barai (Department of CSE, Indian Institute of Information Technology, Nagpur, India), Robin Singh Bhadoria (Department of Computer Science and Engineering, Hindustan College of Science and Technology, India), Jyotshana Bagwari (Department of CSE, Uttarakhand Technical University, India) and Ivan A. Perl (ITMO University, Russia)
DOI: 10.4018/978-1-7998-5876-8.ch008
OnDemand PDF Download:
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Conventional machine learning (ML) needs centralized training data to be present on a given machine or datacenter. The healthcare, finance, and other institutions where data sharing is prohibited require an approach for training ML models in secured architecture. Recently, techniques such as federated learning (FL), MIT Media Lab's Split Neural networks, blockchain, aim to address privacy and regulation of data. However, there are difference between the design principles of FL and the requirements of Institutions like healthcare, finance, etc., which needs blockchain-orchestrated FL having the following features: clients with their local data can define access policies to their data and define how updated weights are to be encrypted between the workers and the aggregator using blockchain technology and also prepares audit trail logs undertaken within network and it keeps actual list of participants hidden. This is expected to remove barriers in a range of sectors including healthcare, finance, security, logistics, governance, operations, and manufacturing.
Chapter Preview
Top

1. Introduction

Supervised deep learning (LeCun et al., 2015) algorithms offer very good performance for a variety of image classification tasks. The typical approach for these tasks comprises 3 steps

  • 1.

    Centralize a large data repository.

  • 2.

    Acquire ground truth annotations (labels) for these data, and

  • 3.

    Employ the ground truth annotations to train Deep Learning (DL) networks for classification.

However, this methodology poses significant practical challenges. In particular, data privacy and security concerns pose difficulties in creating large central data repositories for training. Two years back, Google proposed a de-centralized `federated learning' (Yang et al., 2019) technique to train deep learning models across multiple data sources without sharing sensitive information.

Even though federated learning has significant advantages over centralized learning, there are multiple disadvantages with Federated Learning.

  • 1.

    All participating clients have the same final model. This is unfair to clients that contribute more data. For example, a client might claim to have good data, but might only want the final model. This can be seen at worst as DL model theft and at best as free-rider.

  • 2.

    Clients might introduce back-doors (through bad and/or carefully crafted adversarial data) to corrupt the final model (Bagdasaryan et al., 2018).

Recently, MIT proposed a new approach to federated learning called Split Neural Networks SpiltNN, which can address the above issues. In SplitNN, the final deep learning model isn’t shared with all clients - there is no single final deep learning model. Each client computes its own model, but still shares some “wisdom” derived from other contributing clients. With SplitNN, each client trains a partial deep learning network up to a specific layer known as the cut layer. The rest of the layers are computed by each client.

Blockchain is the technology addressing the issue of privacy, enforcing trust in uncontrolled environments like in Healthcare, Financial etc. Industries. We are reviewing the works in this Blockchain and Federated learning can be combined to get the best of both the technologies to address the underlying issues. Blockchain is a sequential data structure that contains data stored in blocks linked to each other using hashing. There is no limit in the size of the chain and hence the chain can grow larger as new blocks are added to it. Any change in Block bi would require change in all blocks following bi which adds the security to the Blockchain. It is nearly impossible to change this many blocks without getting it noticed by other stakeholders. This distributed ledger is duplicated across all nodes i.e. All nodes have a copy of the ledger. Initially, Blockchain was used for cryptocurrencies but soon, it started finding application in number of domains which requires trust. Blockchain is a computerized, decentralized record that tracks all transaction that happen over a shared system. It is an interlinked and persistently extending list of records put away safely over various interconnected frameworks.

The challenges addressed by the convergence of Federated learning and Blockchain are:

  • 1.

    Privacy Problem: How entities will train model without disclosing data.

    • a.

      Influence problem: 3rd parties can influence the way an AI model behaves

  • 2.

    Economic Problem: How to incentivized 3rd parties correctly to contribute to knowledge of AI models.

  • 3.

    Transparency Problem: Is AI model behavior available to all parties.

  • 4.

    Latency Problem: Centralized AI is inappropriate for use-cases where AI needs to interact in real time with the real world

Key Terms in this Chapter

Split Neural Network (SplitNN): Split learning is a novel technology developed by MIT Media lab’s Camera culture group that allows to train ML models without sharing any raw data and overcomes the drawbacks of Federated learning.

Federated Learning: Federated learning is a technology to enable distributed client devices to train AI models without sharing the data.

Blockchain: Blockchain is the technology addressing the issue of privacy, enforcing trust in uncontrolled environments like in Healthcare, Financial etc. Industries. We are reviewing the works in this Blockchain and Federated learning can be combined to get the best of both the technologies to address the underlying issues. Blockchain is a sequential data structure that contains data stored in blocks linked to each other using hashing.

Machine Learning: It is a subset of Artificial Intelligence in which Machines learns from the data rather than being programmed explicitly.

Complete Chapter List

Search this Book:
Reset