BACK
    05 Sept 2023
  • 5 minutes read

Stefano Braghin (IBM Research): A Pragmatic View of PETs in Regulated Environments

Get a comprehensive perspective on his talk, emphasising the importance of collaboration in overcoming challenges and unlocking the potential of privacy-enhancing technologies (PETs) amidst the evolving landscape.

Stefano Braghin (IBM Research): A Pragmatic View of PETs in Regulated Environments

At the Eyes-Off Data Summit in Dublin, we had the honour of hosting Stefano Braghin from IBM Research. His talk was filled with critical insights on data privacy, emerging PETs, and various often-overlooked aspects of these topics.


This article offers a comprehensive perspective on his talk, emphasising the importance of collaboration in overcoming challenges and unlocking the potential of privacy-enhancing technologies (PETs) amidst the evolving landscape.


The Expanding Data Privacy Landscape

There is a rapid growth of data privacy as a field. The projections are that approximately 75% of the global population will likely be governed by privacy regulations by 2024. This development demands an increase in collaboration, standardisation and innovation within the PETs community.


While regulations like GDPR provide a framework, translating these guidelines into technology gets complicated. On top of that, multiple countries are trying to negotiate the fundamental principles that are affecting their privacy regulations, ultimately increasing the complexity.


Navigating Data Risks and Enhancing Protection

Privacy placement within the data flow — collected, controlled, processed, or consumed — is intricate, impacting service utility and inherently risking data exposure. When identifying privacy vulnerabilities, understanding the methods attackers may use to access data is crucial.


Open source technology is being leveraged to assess privacy threats across various phases of data processing. At every stage, from directly identifiable data to anonymised data, we combat the ease at which external datasets can re-identify pseudonymised data.


Big data, with its high volume and velocity, demands technological solutions to perform tasks like determining sensitive data and identifying potential data breaches. Understanding the statistical properties of datasets allows us to identify combinations of data that could lead to identification, thereby strengthening privacy protection.

Balancing Privacy

Solutions for data privacy are heavily influenced by use-case requirements, including data, modality, and business needs.


Therefore, it is vital to standardise privacy protection from a technology perspective and consider its impact on utility.


When selecting a technology, speed remains a crucial consideration since data privacy measures should not slow down processes or compromise the quality of services.


Challenges with Synthetic Data and Federated Learning

While anonymisation techniques are numerous, the challenge lies in striking a balance between data that is unidentifiable yet irrelevant, and data that is identifiable and is consequently vulnerable.


As technology progresses, it introduces new types of data, including video and audio. The existing privacy-protection methods must evolve to effectively handle these emerging data forms.


One potential approach to data anonymisation involves the use of synthetic data. However, its quality, and thereby its effectiveness, heavily rely on the factors involved in its creation. There’s an inherent risk that such synthetic data may still contain private information, despite the anonymisation attempts.


The GDPR mandates that models trained on private data are considered personal information — irrespective of the technology used for data generation.


Alternative solutions like federated learning present a different set of challenges. While it can ensure security and privacy, its successful implementation is heavily reliant on the trust built among stakeholders involved.


Regardless of the measures in place to secure data and maintain privacy, this mutual trust, or the lack thereof, remains a significant obstacle to overcome.


Differential Privacy

An interesting development in data privacy is the advent of differential privacy. Despite being a ‘heavy’ method from a viewpoint of data utility — which is how much practical use the data holds — it offers mathematical privacy guarantees, making it a game-changer in the field of data privacy.


Central to the idea of differential privacy is the understanding of an inverse relationship between privacy and a factor known as the epsilon value. Essentially, as the value of epsilon decreases, the level of privacy provided by the method increases.


This correlation is key to gauging the effectiveness of differential privacy in particular scenarios, making large datasets more suitable for this method.


Given the costly nature of implementing differential privacy, Stefano highlighted the importance of practical testing. This hands-on approach would help uncover potential issues and ensure that the privacy method is both effective and economically viable before widespread rollout.

The Right To Be Forgotten

One fundamental aspect often overlooked in terms of GDPR is the right to be forgotten. This rule poses a great challenge, especially in federated learning, where retracing the data becomes infeasible, as data may no longer exist or may need to be destroyed according to legal requirements.


To address the question about erasing individual data points and how to remove them, it’s often a combination of re-creation and measurement. The data for someone who withdraws consent may still exist elsewhere.


Federated unlearning, or teaching the model to forget specific data points, is a possible solution. Although still a work in progress, it represents a step forward in addressing the issue.


The Road Ahead

While fully automated solutions may sound attractive, there is a need for a ‘human in the loop’ — someone who understands the law, and can translate it into actionable steps and informed decisions for regulatory and business considerations.


Data privacy is an essential aspect of the AI landscape that demands continuous attention and collaborative efforts. As responsible innovators, it is crucial for data enthusiasts, policy experts, and PET developers to actively participate in the community and navigate the challenges to realise the true potential of emerging technologies.


To learn more about the topics Stefano discussed, we invite you to watch the recording of his talk on our YouTube channel.

privacy-enhancing technologies

eyes-off data summit

eods2023

eodsummit2023

data privacy

2024 Oblivious Software Ltd. All rights reserved.