What is Differential Privacy?

Differential privacy is a data anonymization technique that’s used by major technology companies such as Apple and Google. The goal of differential privacy is simple: allow data analysts to build accurate models without sacrificing the privacy of the individual data points.

But what does “sacrificing the privacy of the data points” mean? Well, let’s think about an example. Suppose I have a dataset that contains information (age, gender, treatment, marriage status, other medical conditions, etc.) about every person who was treated for breast cancer at Hospital X. In this case, a data point is a row of a spreadsheet (or DataFrame, or database, etc.) containing information about a person who was treated for breast cancer.

But let’s say I want to build a machine learning model that predicts the odds of surviving breast cancer at Hospital X. What do I do if I want to preserve the privacy of the patients?

This is where differential privacy comes in. Differential privacy helps analysts build accurate models without sacrificing the privacy of individual data points by introducing randomness into the process of data retrieval.

In the framework of differential privacy, there are two actors: a curator and a data analyst. The curator has all of the data, with all of the original (true) values, and the data analyst wants to interpret that data. The data analyst can “query” the database; that is, the data analyst can ask the curator for a subset of the data. Differential privacy doesn’t give any information away directly; instead, for every variable the data analyst is interested in, such as age, diagnosis, treatment, etc., they might get the true value, or they might not. The likelihood that the analyst gets the true value varies based on how much noise is introduced.

The more times you query the database, the less private the data is. There’s a lot more to differential privacy than I just covered, but you get the basic approach.

#gdpr #data-privacy #differential-privacy #machine-learning

Applications of Differential Privacy to European Privacy Law (GDPR) and Machine Learning
1.50 GEEK