At the Eyes-Off Data Summit in Dublin, Reza Shokri from the National University of Singapore joined us virtually and gave an insightful talk that discussed the crucial interplay between data privacy and AI.
This article presents Reza’s perspective on the relationship between large datasets, AI insights, and privacy concerns. It emphasises the importance of being cautious when dealing with sensitive information to avoid breaches and reveals the quantitative tools we can use to do so.
Direct and Indirect Privacy Risks
Although privacy regulations such as GDPR discuss data protection impact assessment, practical concerns often lean towards data collection, sharing, and access control.
There are two categories of risks: direct privacy risks, which involve unauthorised access to user data, and indirect privacy risks, which occur when the algorithm’s output, the model, reveals sensitive information.
There is a valid concern over models inadvertently encoding individual data records from training sets, turning into significant risks upon exploitation.
To mitigate these risks, Reza advocated that organisations and researchers employ systematic and quantitative methods alongside popular open-source tools that audit privacy in machine learning.
Evaluating Models and Identifying Vulnerabilities
While federated learning mitigates direct privacy risks by enabling shared model training without data exchange, it can inadvertently expose considerable participant data. It is important to employ sound auditing practices to ensure that models trained on personal data are appropriately protected.
The modern AI landscape demands an anticipatory approach to data privacy. Large language models have been known to generate sensitive information from training sets, creating privacy vulnerabilities.
Therefore, conducting pre-deployment assessments of models to evaluate potential privacy risks and determine suitable actions is critical. These analyses should be done both before and after applying mitigation techniques, as they are integral to privacy risk analysis in AI.