Workshop

PPML: Machine learning on data you cannot see

Friday, May 24

11:00 - 13:05
RoomMozzarella
LanguageEnglish
Audience levelIntermediate
Seats left
Sold-out
    Elevator pitch

    Machine Learning models can be exploited to leak sensitive data used during training. In this tutorial we will explore Privacy-preserving machine learning (PPML) methods which hold the promise to overcome these issues by training ML models with full privacy guarantees.

    Abstract

    Privacy guarantees are the most crucial requirement when it comes to analyse sensitive data. These requirements could be sometimes very stringent, so that it becomes a real barrier for the entire pipeline. Reasons for this are manifold, and involve the fact that data could not be shared nor moved from their silos of resident, let alone analysed in their raw form. As a result, data anonymisation techniques are sometimes used to generate a sanitised version of the original data. However, these techniques alone are not enough to guarantee that privacy will be completely preserved. Moreover, the memoisation effect of Deep learning models could be maliciously exploited to attack the models, and reconstruct sensitive information about samples used in training, even if these information were not originally provided.

    Privacy-preserving machine learning (PPML) methods hold the promise to overcome all those issues, allowing to train machine learning models with full privacy guarantees.

    This workshop will be mainly organised in three main parts. In the first part, we will introduce the main threats to data and machine learning models (e.g. membership inference attack ) for privacy. In the second part, we will work our way towards differential privacy: what is it, and how this method works, and how differential privacy could be used with Machine learning. Lastly, we will conclude the tutorial considering more complex ML scenarios to train Deep learning networks on encrypted data, with specialised distributed settings for remote analytics.

    Tentative Outline (180 mins)

    • Introduction: Brief Intro to PPML and to the workshop (10 mins)

    • Part 1: Programming Privacy (60 mins)

      • De-identification
      • K-anonimity and limitations
      • Introduction to Differential Privacy
    • Break (10 mins)

    • Part 2: Strengthening Deep Neural Networks (50 mins)

      • ML Model vulnerabilities: Adversarial Examples and inference attack
      • DL training with Differential Privacy
    • Break (5 mins)

    • Part 3: Primer on Privacy-Preserving Machine Learning (40 mins)

      • DL training on (Homomorphically) Encrypted Data
      • Federated Learning and Intro to Remote Data Science
    • Closing Remarks (5 mins)

    TagsPrivacy, Machine-Learning, Algorithms
    Participant

    Valerio Maggio

    Valerio Maggio is a Data Scientist, SSI fellow, and Community Advocate at Open Mined. He holds a Ph.D. in Computer Science, and his research interests span a broad range of topics in data science, from data processing to reproducible machine learning analytics. Before joining Anaconda, Valerio worked in the Higher Education section, holding an appointment as Senior Research Associate for Data Science and Artificial Intelligence at University of Bristol, and Fondazione Bruno Kessler (Italy). Valerio is also an open-source contributor and an active member of the Python community. Over the last 12 years he has contributed to and volunteered at many international conferences and community meetups like PyCon Italy, PyData, EuroPython, and EuroSciPy.