Reflections on the Rival Nature of Data

Our thoughts on a recently published research paper

3 min read

Apr 4, 2025

A recent paper was released that sheds light on the legislative implications of data usage and sharing. It proposes that data itself is a non-scarce, but rival good, meaning that if someone learns too much from it, it may prevent another individual from learning what they need. Below we share interesting parts from the paper, along with our thoughts on the work presented.

The authors define rival and non-rival goods, while also presenting examples of each.

Rival goods are those that can be consumed by no more than one person at a time. The use of a rival good by one person significantly reduces the ability of other parties to enjoy that good. Non-rival goods are those that can be used by more than one person simultaneously. The use of a non-rival good by one person does not impact the ability of others to simultaneously enjoy the non-rival good.”

From this definition, it may seem like any good or service which does not disappear upon consumption in physical space can be classified as a non-rival good, since its consumption by an individual does not preclude the others from enjoying the good later on. Examples could include streaming websites (as many people can visit these simultaneously), public firework performance (while this is a physical event, an individual’s consumption of this good is not affected by other people’s simultaneous consumption - everyone gets to watch it) etc.

By this principle data can be seen as an example of a non-rival good. And while the logic behind this may seem sound (and specifically - multiple people can access the same data simultaneously without reducing the ability of the others to get the benefits from having access to this data), it is, unfortunately, not fully correct in the age of data privacy regulations.

Concretely: should one party obtain access to a specific sensitive dataset, this may, in fact, preclude another party from either accessing the data altogether (due to access control and minimisation requirements included in certain data protection regulations) or enjoying the same benefits (if this data presents a competitive advantage, the timing of the access can play a significant role in the value assigned to such data access).

Moreover, as more organisations start to embrace the use of differentially private (DP) data processing, the more obvious the rival nature of the data starts to become. As the authors of the work state below:

“This ability to reason about how privacy losses accumulate across data uses means that a privacy budget may be set for one or more datasets containing information collected from a subject; the budget should express an acceptable level of privacy loss. The privacy budget may then be divided between analysts who will perform computations (with differential privacy guarantees) over the relevant datasets. This budget, shared between multiple parties, makes the rival nature of the data explicit.”

So if there are multiple analysts attempting to access the same dataset under DP data processing, they are competing against each other with respect to the privacy budget which must be shared across all data accesses. Once this budget runs out, the dataset has to be discarded in full, meaning that no further information could ever be gained from this dataset.

Moreover, over-reliance on the same dataset can lead to overconfidence and overfitting, producing conclusions which are seemingly sound, but could (e.g. in the context of machine learning model training) actually reduce the usefulness of the dataset with respect to what a statistical model actually learns from it.

For instance: should a chest X-ray classifier overuse the same small dataset of patients with very specific attributes (e.g. only the X-rays that contain features associated with a viral pneumonia), the model may completely fail to generalise to any other dataset. This can be prevented by limiting how much the model can learn from data by early stopping, regularisation etc., but this does not alleviate the core challenge: data can often be considered as a rival good not only with respect to multiple parties trying to access it, but also in a setting where the same party accesses this data multiple times, reducing the benefit (and turning into a detriment at a certain point) after each access.

Both of these issues are summarised in the concluding section of the work:

“In the context of privacy, the ongoing use of data continuously leaks more information about the dataset, depleting its privacy budget. From a statistical perspective, the adaptive reuse of data creates risks for the validity of the conclusions of statistical analyses, endangering the progress of scientific understanding. These concerns should be reflected in regulation restructuring the data ecosystem and, in particular, in any regulation attempting to encourage the reuse of data. We observe that in its current state, the EU Data Governance Act, part of the European Strategy for Data, risks undermining its own goal of ensuring the safe sharing and reuse of data.”

The paper does an excellent job of highlighting the need for, and importance of, a team privacy budget allocation, which is a feature of our solution AGENT. You can read more about AGENT here or get in touch if you want to see more.

data privacy

research