BACK
    11 Sept 2023
  • 4 minutes read

Reza Shokri (NUS): Auditing Data Privacy for Machine Learning

Read about Reza’s perspective on the relationship between large datasets, AI insights, and privacy concerns.

Reza Shokri (NUS): Auditing Data Privacy for Machine Learning

At the Eyes-Off Data Summit in Dublin, Reza Shokri from the National University of Singapore joined us virtually and gave an insightful talk that discussed the crucial interplay between data privacy and AI.


This article presents Reza’s perspective on the relationship between large datasets, AI insights, and privacy concerns. It emphasises the importance of being cautious when dealing with sensitive information to avoid breaches and reveals the quantitative tools we can use to do so.


Direct and Indirect Privacy Risks

Although privacy regulations such as GDPR discuss data protection impact assessment, practical concerns often lean towards data collection, sharing, and access control.


There are two categories of risks: direct privacy risks, which involve unauthorised access to user data, and indirect privacy risks, which occur when the algorithm’s output, the model, reveals sensitive information.


There is a valid concern over models inadvertently encoding individual data records from training sets, turning into significant risks upon exploitation.


To mitigate these risks, Reza advocated that organisations and researchers employ systematic and quantitative methods alongside popular open-source tools that audit privacy in machine learning.


Evaluating Models and Identifying Vulnerabilities

While federated learning mitigates direct privacy risks by enabling shared model training without data exchange, it can inadvertently expose considerable participant data. It is important to employ sound auditing practices to ensure that models trained on personal data are appropriately protected.


The modern AI landscape demands an anticipatory approach to data privacy. Large language models have been known to generate sensitive information from training sets, creating privacy vulnerabilities.


Therefore, conducting pre-deployment assessments of models to evaluate potential privacy risks and determine suitable actions is critical. These analyses should be done both before and after applying mitigation techniques, as they are integral to privacy risk analysis in AI.

Implementing Membership Inference

These observations underscore the necessity for a robust framework to quantify and mitigate risks through quantitative evaluation.


For instance, imagine a scenario where an adversary can differentiate between two parallel worlds — each with similar training sets, but one containing an additional data point. This can help determine potential privacy leakage. If the adversary successfully identifies their world, it signifies information leakage in the algorithm.


Membership inference refers to the ability to determine if a particular data point was part of a model’s training set. This type of privacy risk can be measured using re-identification attacks in machine learning.


To identify potential members of the training set, an adversary observes a given model and maps the data points, with varying levels of accuracy and error. The higher the accuracy of the adversary in identifying training data members and reducing the error rate in false identifications, the greater the privacy risk.

Pre-Deployment Assessments of Privacy Risks

Privacy risks can be categorised into low, medium, and high-risk algorithms, such as those used in large language models. Prioritising anonymity and blurring data points with the general population can protect individuals’ privacy while mitigating the risks posed by membership inference attacks.


Taking a forward-thinking approach to data privacy, we should conduct pre-deployment assessments of our models. This involves evaluating projected privacy risks, and deciding whether to deploy, further investigate, or refrain from deploying due to excessive risks.


This evaluation should occur before and after applying mitigation techniques, reflecting a robust measure of privacy risk analysis in AI. Modern AI guidelines emphasise mitigating re-identification risks in machine learning and ongoing efforts to design more potent techniques for privacy risk quantification.


ML Privacy Meter

The ML Privacy Meter, an operationalised regulatory requirement for privacy risk quantification in machine learning, exemplifies how tools can assist an organisation’s privacy measures.


For instance, the ML Privacy Meter can measure the risk of identifying the presence of specific documents in training sets. It has the potential to highlight the most vulnerable data points, particularly those containing sensitive information.


It is crucial to note that as models become more capable and accurate, privacy risks tend to increase as more data points are memorised.


In conclusion, granting access to models trained on sensitive and personal data is not much riskier than providing access to the data used for training those models.


With various mitigation techniques available, there is a need to work towards a standard comparison method. The ML Privacy Meter is a step in this direction, but further efforts are required to enhance privacy risk assessment.


To learn more about the topics Reza discussed, we invite you to watch the recording of his talk on the Oblivious YouTube channel.

privacy-enhancing technologies

eyes-off data summit

eods2023

eodsummit2023

data privacy

2024 Oblivious Software Ltd. All rights reserved.