Data Counterfactuals
What might change in an AI model if its training data changes?
This is a short memo meant to explain the datacounterfactuals.org project and website. It will be cross-posted on the Data Leverage Substack. The site exists to:
- show how cross-cutting the idea of "data counterfactuals" is (to name just a few relevant topics: data valuation, algorithmic collective action, data scaling, data selection, data poisoning, privacy, unlearning)
- make certain data counterfactual measurements easier to understand
- illustrate connections between technical and social data-centric work
There are many reasons we might want to understand how a specific piece of data impacts an AI model. Perhaps we want to inspect particularly valuable data, pay people based on the impact of their data (though this is a tricky endeavor!), or check data for errors. Or perhaps a group of people want to withhold data for bargaining or protest. This web project grew out of discussions on data valuation, algorithmic collective action, data scaling, data selection, data poisoning, privacy, unlearning, and neighboring topics. Counterfactual questions about how data might change are foundational to all these areas, and so understanding various questions in terms of data counterfactuals can be practically and academically useful.
What is a data counterfactual?
A data counterfactual is a scenario in which the AI training data for a model changes in some way. Often, we are interested in comparing two counterfactual scenarios to understand the impact of some change on AI capabilities.
Consider this thought experiment: imagine you are going to train a machine learning model on a very small dataset: let's say the dataset has just four units of data (or, if it seems implausible that we'd ever want to do this, we can imagine it's a big dataset with distinct bundled subsets). Now imagine a grid where every possible combination of training objects appears as a row, every possible evaluation set appears as a column, and each cell records the performance for a given train/eval pairing. For our very small example with just four data objects, we can call them A, B, C, and D. (Again, these could literally map to four single observations in a toy example, or map to four large datasets we are considering mixing.)
With this grid in mind, we can explore the most basic useful data counterfactual, "leave-one-out". By comparing a row that includes one point with the nearby row in which that point is missing, we can understand the impact (in a causal sense) of adding or removing that point. By computing the difference between these two cells, we can learn how much a given data point helped or hurt our model. From there the same logic can be extended to groups of points, weighting data points, replacing data with other synthetic data, corrupting certain examples, or coordinated withdrawal.
Very simply, we can imagine training an LLM with a bunch of fiction books, science articles, and social media posts. If we train a second LLM without the science articles and compare the performance, we are exploring the "no science articles" data counterfactual.
Leave-one-out toy example
Why data counterfactuals are relevant to data leverage
This frame helps us connect topics that might seem distinct, for instance connecting influence estimation and Shapley values with data strikes and data contribution campaigns. In ML, we often want to ask questions about removing a point, reweighting a group, estimating an influence score, fitting a scaling curve, etc. with the purpose of understanding our data and model. But counterfactuals can also be induced by strategic actors. Data strikes, boycotts, contribution campaigns, and bargaining efforts try to change what data the world actually produces, or what data an AI developer can access on acceptable terms.
When people can withhold, redirect, or condition the supply of data, data counterfactual measurement directly maps to governance power! In other words, the kind of experiments we'd want to run if we're just an ML researcher trying to make our model better (via data selection or other data-centric approaches) are the same experiments we'd want to run if we're trying to organize data-related collective action or set up an efficient data market. If we had a shared bank of results from such an experiment, those results would be useful to actors with a wide variety of interests and goals! Furthermore, this frame also makes it very clear where questions about provenance, licensing, and contribution governance directly determine which training rows are legally, socially, or politically available in the first place.
Why data counterfactuals are relevant to core AI
This frame is not only relevant to data leverage or governance. It is also deeply relevant to what many people would recognize as "core AI" questions. If we want to know how models improve with more data, which data are worth keeping, which examples are redundant, what happens when we swap in synthetic data, why a model fails on one slice but not another, or how to remove the effect of a problematic subset, we are asking questions about what changes when the data change.
Data scaling, ablations, selection, influence estimation, distillation, synthetic data substitution, privacy interventions, poisoning, and unlearning are not all the same task. But they do all involve moving between nearby training worlds and asking how model behavior shifts as we move. The point is not to collapse everything into one technique. The point is to notice that many central ML questions are already, in a practical sense, data counterfactual questions.
- Read the launch memo
More on the motivation for this site
- Open the glossary
Quick definitions for the recurring terms without leaving the main thread for long.
- Open the grid explorer
Explore the data counterfactuals "grid" framing directly.
- Compare formalisms
A more technical, web-only companion that lines up neighboring formalisms side by side.
- Related areas and papers
A dynamic, non-exhaustive set of related areas and readings, updated via Semble.