PPML: Machine learning on data you cannot see

Machine Learning models can be exploited to leak sensitive data used during training. In this tutorial we will explore Privacy-preserving machine learning (PPML) methods which hold the promise to overcome these issues by training ML models with full privacy guarantees.

Privacy guarantees are the most crucial requirement when it comes to analyse sensitive data. These requirements could be sometimes very stringent, so that it becomes a real barrier for the entire pipeline. Reasons for this are manifold, and involve the fact that data could not be shared nor moved from their silos of resident, let alone analysed in their raw form. As a result, data anonymisation techniques are sometimes used to generate a sanitised version of the original data. However, these techniques alone are not enough to guarantee that privacy will be completely preserved. Moreover, the memoisation effect of Deep learning models could be maliciously exploited to attack the models, and reconstruct sensitive information about samples used in training, even if these information were not originally provided.

Privacy-preserving machine learning (PPML) methods hold the promise to overcome all those issues, allowing to train machine learning models with full privacy guarantees.

This workshop will be mainly organised in three main parts. In the first part, we will introduce the main threats to data and machine learning models (e.g. membership inference attack ) for privacy. In the second part, we will work our way towards differential privacy: what is it, and how this method works, and how differential privacy could be used with Machine learning. Lastly, we will conclude the tutorial considering more complex ML scenarios to train Deep learning networks on encrypted data, with specialised distributed settings for remote analytics.

Tentative Outline (180 mins)

Introduction: Brief Intro to PPML and to the workshop (10 mins)
Part 1: Programming Privacy (60 mins)
- De-identification
- K-anonimity and limitations
- Introduction to Differential Privacy
Break (10 mins)
Part 2: Strengthening Deep Neural Networks (50 mins)
- ML Model vulnerabilities: Adversarial Examples and inference attack
- DL training with Differential Privacy
Break (5 mins)
Part 3: Primer on Privacy-Preserving Machine Learning (40 mins)
- DL training on (Homomorphically) Encrypted Data
- Federated Learning and Intro to Remote Data Science
Closing Remarks (5 mins)

PPML: Machine learning on data you cannot see

Friday, May 24

11:00 - 13:05

Tentative Outline (180 mins)

Valerio Maggio