Unlocking Microsoft's Data for Social Good: Mayana Pereira at Eyes-Off Data Summit 2024
This article explores how Pereira's work is helping regulators, researchers, and policymakers make informed decisions and the methods Microsoft employs to ensure data privacy and transparency.
7 minutes read
Oct 24, 2024

Mayana Pereira’s presentation at the Eyes-Off Data Summit 2024 brought a fresh perspective on how big data can be leveraged for the public good.
Mayana Pereira is a research scientist at Microsoft’s AI for Good Lab and in her talk, she demonstrated how responsible data sharing can unlock transformative insights into pressing social issues, all while maintaining high standards of privacy.
Read on to discover how the strategic release of data can create positive social change and why collaboration with other tech giants is key to maximising its impact.
The Power of Big Data for Social Good
At the AI for Good Lab, Microsoft's mission is to focus on socially relevant problems that can be addressed through artificial intelligence (AI), data science, and privacy-preserving technologies.
Pereira emphasised that opening data is a crucial first step, as it can foster innovation and bring clarity to problems that would otherwise be hard to quantify. Microsoft’s Broadband Data project is a prime example of how open data can be used to tackle the digital divide in the U.S.
The Broadband Data set, which estimates broadband usage across the U.S. at a zip code level, revealed a critical distinction between broadband availability and broadband usage. While infrastructure exists in many regions, a significant portion of the population still does not use the internet at broadband speeds. This gap highlighted a key insight: infrastructure alone is not enough to close the digital divide.
By releasing this dataset, Microsoft empowered regulators, state officials, and researchers to better understand how digital infrastructure was being used and where improvements were needed. Pereira shared that the White House and various state regulators have used the Broadband Data set to inform policy and investment decisions.

Beyond Access: Digital Literacy in Focus
The next challenge Pereira’s team tackled was digital literacy. While broadband access is critical, it’s only part of the equation. True digital equity requires not only access to the internet but also the ability to use it effectively.
To address this, Microsoft developed the Digital Literacy Data set. This dataset focuses on understanding how people use their digital devices, using the variety and intensity of app usage as a proxy for digital literacy. By correlating this data with socioeconomic factors such as income and education, Microsoft was able to identify which communities were struggling with digital literacy.
The Digital Literacy Data set allows stakeholders to see beyond infrastructure and pinpoint the regions where education programs or digital literacy training might be needed. For example, despite having robust broadband infrastructure, areas with lower education levels often showed limited engagement with digital tools, underscoring the importance of focusing on both access and usage to bridge the digital divide.
This approach allowed Pereira's team to generate more nuanced insights about the digital landscape, moving beyond infrastructure toward addressing behavioural patterns and digital education gaps.

This effort is part of Microsoft’s wider Airband initiative, which seeks to close the global digital divide by bringing internet access to underserved communities. This program addresses the infrastructure gap while also ensuring that once connected, communities have the digital skills and resources to thrive.
Microsoft aims to bring internet access to 250 million people globally, including 100 million people across Africa, by 2025.
Safeguarding Privacy with Differential Privacy
While these projects clearly demonstrated the potential of big data to solve social issues, data privacy is a non-negotiable aspect of Microsoft’s approach. Ensuring user privacy is critical when dealing with sensitive information, especially at such a granular level. To accomplish this, Microsoft employs differential privacy.
Differential privacy allows organisations to analyse large datasets while ensuring that individual data points cannot be traced back to specific users. Pereira delved into the technical aspects of this method, explaining that Microsoft's privacy design framework is built on the concept of the privacy budget, which refers to the trade-off between data utility and privacy protection.
In the case of the Broadband Data project, Microsoft used a very strict privacy budget because broadband usage correlates with socioeconomic factors such as income, making it a potentially sensitive dataset. By employing differential privacy with an epsilon value of 0.1, Microsoft was able to ensure that individuals in smaller, more vulnerable zip codes were protected while still providing useful insights to regulators and policymakers.
In contrast, Pereira explained that the Digital Literacy Data set had a more generous privacy budget because the nature of the data (which tracked app usage across zip codes rather than individual behaviour) presented lower privacy risks. This flexible approach, tailored to each dataset’s risk profile, is central to Microsoft’s commitment to privacy by design.
Process Transparency: A Commitment to Trust
A critical pillar of Pereira’s approach to data release is process transparency. It's about ensuring that the methods used to protect privacy and maintain data utility are fully understood by the public.
To this end, Microsoft provides detailed documentation on how the data was collected, the privacy-preserving techniques used, and the potential error margins introduced by those techniques. For example, the Broadband Data set includes estimates of error ranges, allowing users to assess the reliability of the data based on their specific needs.
Pereira hopes this transparency will inspire other organisations to adopt similar practices. By providing a clear blueprint for how to safely and effectively release data, Microsoft hopes to empower smaller organisations and NGOs to unlock their own datasets for the public good.
Collaboration to Maximise Impact
While Microsoft’s data is invaluable for certain analyses, particularly within the U.S., it doesn't provide a complete picture. For example, when it comes to addressing global issues like climate change, no single company has enough data to provide all the necessary insights.
Pereira suggested that teaming up with other tech giants, such as Google and Apple, would allow for more comprehensive datasets that could be used to tackle global challenges. By pooling their resources and expertise, these companies could create a broader, more holistic view of the problems they are trying to solve.
Microsoft’s OpenDP initiative, which partners with a range of academic and industry leaders, is one example of how collaboration is already taking place.

What’s Next?
Looking ahead, Pereira shared her excitement about Microsoft’s plans to extend its data efforts into new areas, particularly sustainability and climate change. By applying the same principles of privacy design and process transparency, Microsoft hopes to unlock datasets that can help tackle the global climate crisis.
While Microsoft has primarily used Windows telemetry data for its current projects, future initiatives might involve data from other sources, such as browsers or search engines, to provide more comprehensive insights into issues like energy usage and carbon emissions. These datasets could then be shared with researchers and policymakers to inform more sustainable decision-making.
Key Takeaways from Mayana Pereira’s Presentation
Big Data Drives Social Change
: Microsoft’s Broadband and Digital Literacy datasets are being used by regulators and researchers to tackle the digital divide and improve digital literacy across the U.S.Privacy by Design
: Using differential privacy, Microsoft ensures that user data remains protected while still providing useful insights to policymakers.Process Transparency
: Microsoft’s commitment to transparency allows others to trust and replicate their methods, helping to build a community of responsible data publishers.Collaboration is Key
: By collaborating with other tech giants, Microsoft can unlock even more comprehensive datasets to address global challenges like climate change.Data Utility and Privacy Are Not Mutually Exclusive
: Microsoft has demonstrated that it is possible to release valuable datasets while maintaining strict privacy protections for users.
Data for Good
Mayana Pereira’s presentation made it clear that big data has the potential to drive meaningful social change when used responsibly. Through projects like the Broadband Data and Digital Literacy Data sets, Microsoft is helping to close the digital divide while maintaining strong privacy protections for users.

If you're interested in more takeaways from the Summit, be sure to check out our recaps from Day 1 and Day 2 as well as the articles covering talks by Max Schrems and Andreas Dewes. You can also watch the full recording of Mayana Pereira's presentation on our YouTube channel. To learn more about how your organisation can leverage big data to make a positive impact, contact us today.
data privacy
differential privacy
big data
eods2024
eyes-off data summit