Privacy Preserving Data Mining as Proof of Useful Work: Exploring an AI/Blockchain Design

Privacy Preserving Data Mining as Proof of Useful Work: Exploring an AI/Blockchain Design

Hjalmar K. Turesson, Henry Kim, Marek Laskowski, Alexandra Roatis
DOI: 10.4018/978-1-6684-7132-6.ch024
OnDemand:
(Individual Chapters)
Available
$37.50
No Current Special Offers
TOTAL SAVINGS: $37.50

Abstract

Blockchains rely on a consensus among participants to achieve decentralization and security. However, reaching consensus in an online, digital world where identities are not tied to physical users is a challenging problem. Proof-of-work provides a solution by linking representation to a valuable, physical resource. While this has worked well, it uses a tremendous amount of specialized hardware and energy, with no utility beyond blockchain security. Here, the authors propose an alternative consensus scheme that directs the computational resources to the optimization of machine learning (ML) models – a task with more general utility. This is achieved by a hybrid consensus scheme relying on three parties: data providers, miners, and a committee. The data provider makes data available and provides payment in return for the best model, miners compete about the payment and access to the committee by producing ML optimized models, and the committee controls the ML competition.
Chapter Preview
Top

Why Privacy-Preserving Data Mining

The application of machine learning (ML) to important problems in medicine and finance often results in an apparent contradiction: Training the models requires access to large and varied data sets under industry or regulatory expectation that security and privacy will be preserved, even though the size and scope of the data collected makes it attractive to hackers and increases likelihood of malicious or even unintended privacy breaches. Recent news reports have highlighted data security and privacy failures (Armeding, 2018; Cameron, 2017; Subramanian & Malladi, 2020). To mitigate this seeming contradiction and limit data leaks, a popular scheme obfuscates the raw data and applies machine learning on the transformed data, enabling data-driven discovery (“mining”) of insights while ensuring that the data remain private. This scheme which preserves privacy yet maintains data utility and modeling accuracy is called privacy-preserving data mining (Thuraisingham, 2005).

Given the popularity of AI (Siau & Wang, 2020; Wang & Siau, 2019), it is attractive to conceptualize a blockchain’s proof-of-work mathematical problem as a data mining problem. However, proof-of-work is most compelling for blockchain use cases in which the proof of access to resources is a proxy for proof of incorruptibility amongst untrusted potential validators (Nakamoto, 2018). Bitcoin and Ethereum are blockchain networks that exemplify this “trustless,” “permissionless” context. Clearly, raw, un-obfuscated data cannot be provided to third party validators (cryptocurrency miners) to do data mining on such open blockchains; miners may be trusted to do transparent, straightforward validation, but they cannot be trusted with raw data. Hence, our PoUW solves a privacy-preserving data mining problem, not a generic data mining problem using raw data.

Complete Chapter List

Search this Book:
Reset