Machine Learning models can be exploited to leak sensitive data used during training. In this tutorial we will explore Privacy-preserving machine learning (PPML) methods which hold the promise to overcome these issues by training ML models with full privacy guarantees.
Privacy guarantees are the most crucial requirement when it comes to analyse sensitive data. These requirements could be sometimes very stringent, so that it becomes a real barrier for the entire pipeline. Reasons for this are manifold, and involve the fact that data could not be shared nor moved from their silos of resident, let alone analysed in their raw form. As a result, data anonymisation techniques are sometimes used to generate a sanitised version of the original data. However, these techniques alone are not enough to guarantee that privacy will be completely preserved. Moreover, the memoisation effect of Deep learning models could be maliciously exploited to attack the models, and reconstruct sensitive information about samples used in training, even if these information were not originally provided.
Privacy-preserving machine learning (PPML) methods hold the promise to overcome all those issues, allowing to train machine learning models with full privacy guarantees.
This workshop will be mainly organised in three main parts. In the first part, we will introduce the main threats to data and machine learning models (e.g. membership inference attack ) for privacy. In the second part, we will work our way towards differential privacy: what is it, and how this method works, and how differential privacy could be used with Machine learning. Lastly, we will conclude the tutorial considering more complex ML scenarios to train Deep learning networks on encrypted data, with specialised distributed settings for remote analytics.
Introduction: Brief Intro to PPML
and to the workshop (10 mins
)
Part 1: Programming Privacy (60 mins
)
Break (10 mins
)
Part 2: Strengthening Deep Neural Networks (50 mins
)
Break (5 mins
)
Part 3: Primer on Privacy-Preserving Machine Learning (40 mins
)
Closing Remarks (5 mins
)
Valerio Maggio is a Data Scientist, SSI fellow, and Community Advocate at Open Mined. He holds a Ph.D. in Computer Science, and his research interests span a broad range of topics in data science, from data processing to reproducible machine learning analytics. Before joining Anaconda, Valerio worked in the Higher Education section, holding an appointment as Senior Research Associate for Data Science and Artificial Intelligence at University of Bristol, and Fondazione Bruno Kessler (Italy). Valerio is also an open-source contributor and an active member of the Python community. Over the last 12 years he has contributed to and volunteered at many international conferences and community meetups like PyCon Italy, PyData, EuroPython, and EuroSciPy.