Across governments, companies, academia, civil society, and international organizations, there is a rapid push to establish guardrails for AI. From Bletchley Park (UK) to Delhi (India), and from Seoul (ROK) to Paris (FR), the rising frequency of AI summits—along with their expanding agendas and increasingly high-level, global participation—signals the urgency and importance now attached to developing safe and responsible AI.
At the multilateral level, this global momentum is reflected in the UN General Assembly’s resolution (A/RES/79/325), adopted in August 2025, which established two new mechanisms for global AI governance: an International Scientific Panel on AI and the Global Dialogue on AI Governance- a multistakeholder annual platform. Together, these mechanisms bring more than 100 countries, including many from the Global South and Least Developed Countries (LDCs), into a more inclusive and sustained conversation on AI governance.
The surge in activity and inclusivity is welcome and overdue. Yet, the conversation remains incomplete. It is primarily focused on AI systems' outputs, paying too little attention to the inputs that shape them.
We intervene only after the model generates, speaks, and acts; AI governance is primarily conceived of as a downstream exercise.
Without bringing together the conversation on the governance of inputs with the governance of outputs, current AI governance efforts will fail to address what is at the core of the AI governance ambition:
-
Protection of fundamental rights: ensuring that AI systems do not undermine but uphold the core human rights in international law by embedding human rights safeguards in AI systems, preventing rights violations, and ensuring accountability and oversight.
-
Equitable distribution of AI benefits: ensuring that the economic, social, and development gains of AI are equitably shared across countries and communities, rooted in the idea that AI should contribute to human development without reinforcing or widening existing inequalities.
The official outcome statement from the AI Action Summit (Paris, 2025), for example, includes the commitment to human-rights and inclusive development: “We underline the need for a global reflection integrating inter alia questions of safety, sustainable development, innovation, respect for international laws including humanitarian law and the protection of human rights, gender equality, linguistic diversity, protection of consumers and of intellectual property rights.” (The emphasis is ours.)
To get there, we must add the conversation on governing inputs (data governance) to the conversation on governing outputs (AI governance). This conversation is happening, but not in the same way, with the same frequency, visibility, or heads-of-government participation as the AI summits, and more or less on parallel tracks.

For example, the Global Digital Compact (GDC) adopted by the UN General Assembly at the Summit of the Future (September 2024) established the Multi-Stakeholder Working Group on Data Governance Across All Levels, convened by the Commission on Science and Technology for Development (CSTD) to “advance responsible, equitable, and interoperable data governance approaches”. And even though the August 2025 UNGA AI resolution is a direct follow-up to the GDC, the data governance and AI dialogue are happening in parallel.
In what follows, we argue that AI governance cannot protect fundamental rights and achieve equitable distribution of AI benefits without data governance. Then, we share two key ideas from data governance that are ripe for supporting the development of safe, responsible, human rights-respecting, and inclusive AI.
Two Interwoven Realms
AI governance today, with its focus on adversarial testing (such as red teaming), multi-turn, agentic, contextual, alignment, and behavioral-safety evaluation, emphasizes governing the outputs of AI models.
However, model behavior begins with data. Without vast troves of information–often scraped, repurposed, repackaged without knowledge of those who produced it–modern AI systems would not exist.
Most AI harms originate in data practices, not models: privacy breaches, discrimination, misclassification, exclusions of groups with particular needs such as children, and community-level harms.
And yet, data management and AI governance are treated as largely separate domains. This has led to a fragmented governance regime.
We have previously argued that data governance and AI governance must be approached as part of a single, integrated conversation. That argument not only still stands but becomes even more urgent in light of the two core ambitions shaping today’s global AI agenda: developing safe and trustworthy, rights-protecting and promoting AI, and scaling AI systems for equitable societal and economic benefit. Both ambitions depend fundamentally on the governance of the data on which AI relies.
-
AI systems exist within—and depend entirely on—the broader data lifecycle: Data governance spans the full continuum of planning, collecting, accessing, retaining, sharing, and ultimately deleting data. AI systems are embedded in this lifecycle: they ingest existing data, transform it, and generate new outputs that often become new inputs. This tight coupling means that weaknesses in data governance flow directly into AI performance.
-
Data governance ensures the quality foundations that AI requires: No amount of data volume and scale can compensate for poor-quality or unrepresentative data. Weak data governance introduces systemic risks—such as hallucination amplification, embedded bias, and degraded safety or refusal behaviour—that cannot be solved by scale (i.e., more data) or model architectures alone. Robust data governance provides the processes, standards, and safeguards needed for trustworthy data foundations, including adequate representation (e.g., linguistic, demographic, geographical).
-
Without data governance, AI systems silently inherit legal, ethical, and compliance risks: If training data is obtained through opaque or unlawful means—such as indiscriminate web scraping without consent or legal basis—AI models internalize those violations. The result is “data laundering,” in which privacy breaches, copyright issues, or other rights violations are encoded into model parameters, creating downstream compliance and reputational risks for deployers and regulators alike.
-
A social license for AI depends on a social license for data re-use: Public trust in AI is impossible without public trust in how data is accessed, managed, and (re)used. Data governance provides mechanisms for transparency, community participation, expectation-setting, accountability, and redress. Only when trusted data practices are established can AI systems credibly claim legitimacy. Effective data governance is therefore essential not just for safety, but for the long-term viability of the AI ecosystem itself.
-
Data governance is technology-agnostic—and therefore more durable: AI governance is necessarily tied to specific technological capabilities that evolve rapidly. In contrast, data governance principles—quality, purpose limitation, access controls, accountability—are stable across generations of technology. Investing in strong data governance provides a resilient foundation for whatever forms AI may take in the future.
-
Data governance already offers the institutional architecture that AI governance seeks to build: Policymakers have decades of experience developing standards for data quality, privacy, interoperability, due diligence, and impact assessment. AI governance efforts can build on this well-established body of practice rather than starting from scratch. Recognizing data’s central role allows regulatory frameworks to become more coherent, efficient, and aligned with real-world AI development cycles, and building on an established base will yield effective solutions faster.
Data Governance Ideas Ripe For Enabling AI Governance
Data governance principles and practices provide relatively mature, upstream approaches that reduce risk, increase trust, protect rights, and support equitable AI. We explore two particularly promising approaches for AI.
1. From Data Extraction to Data Stewardship
The recent explosion of AI capability has accelerated a global rush to collect and use data. Companies scrape websites, digitize cultural archives, and ingest vast quantities of personal and public information, often with little transparency. This extractive model is problematic, from the violation of fundamental rights to the flow of data-generated profits in one direction only, leading to the concentration of profit and power. While individuals and communities experience the risks and harms of this extractive regime, they see few benefits.
Data stewardship offers an approach for addressing the issues created by today’s AI-driven data appetite. Central to data stewardship is the understanding that data is a shared asset, not a resource to be captured, explored, and commodified without concern. This implies that those who produce, govern, process, and use data carry specific responsibilities to the individuals and communities the data is from and/or about. Data stewards, whether individuals or institutions, fulfill these responsibilities and hold others who process data accountable, too.
In particular, data stewards:
-
Embed rights protections: stewards ensure data is produced and used only for legitimate, rights-respecting purposes, with particular considerations for vulnerable groups (e.g., children), with privacy and safety as part of system design.
-
Improve data quality and representativeness: stewards clean and curate data, assess and ensure representation, including that of vulnerable groups and marginalized communities, and address historical biases.
-
Create transparency across the data lifecycle: stewards curate metadata, version data sets, and document data flows and data use.
-
Provide accountability and oversight: stewards act as named and accountable actors, creating formal responsibility for protecting people and their interests, returning agency to individuals and communities from which the data originate and/or are about, by offering recourse.
-
Enable benefit sharing: stewards, through transparency and oversight, can provide the enabling conditions to ensure that value created from local data benefits local people by generating insights or (AI) tools to address their needs, for example.
Data stewardship, in short, provides the upstream discipline that AI governance currently lacks.
2. The Case for Data Commons
Today, regions, countries, and corporations that already have large digital economies and robust data infrastructure, that have skilled technical labor and compute capacity can turn data into high-value AI products faster than others. AI creates Winner-Takes-Most markets, exacerbated by uneven regulatory power, in which developed economies and companies with powerful lobbies can shape global AI norms, interoperability standards, market access rules, and data and compute export controls.
Developing economies, less powerful communities, and individuals have limited influence on how their data is used, who the AI benefits, and what risks they absorb: regulatory asymmetry reinforces economic asymmetry.
A data commons can help shift this dynamic. A data commons is a collectively governed system that manages how data is stewarded–from data production to processing and use–embedding benefit-sharing mechanisms to ensure that value created from data serves the (data) contributing community. Critically, a data commons is not merely a repository of data but a governance regime, distinguishing it from similar arrangements such as data lakes or hubs that aggregate data but do not embed rights and responsibilities.
-
Provides community-level agency and benefit sharing: a data commons gives individuals and communities a voice in its governance and, as such, empowers them to guide the use of data from and about them to their benefit
-
Gives bargaining power: individuals and communities gain influence by pooling their data, metadata, and technical infrastructure under shared rules (access, use, benefit-sharing, etc.).
Data commons serve as a tool for correcting global AI value asymmetries by changing the structure of who controls and benefits from data: a data commons creates the governance and bargaining power needed to ensure AI becomes an equalizer rather than an extractor. A data commons represents a shift from competitive data hoarding toward shared infrastructure—mirroring how societies have historically governed other essential resources such as roads, libraries, or scientific databases.
Because data commons help level the playing field, well-governed commons can also help unleash new forms of innovation. When access to high-quality data is no longer restricted to a handful of dominant countries or corporations, a broader ecosystem of researchers, public institutions, and startups can meaningfully participate.
In short, data commons are not just tools for better AI—they also provide the foundations for a more inclusive and innovative AI economy.
Input and Output Governance
An entire ecology of data governance tools and norms has emerged over the last decades. Embedding these mature practices, and the field that produced them, into the lifecycle of AI development is essential. Suppose we only intervene after a system is trained and deployed. In that case, we are effectively governing at the point of no return, when many risks and harms are already baked into the model’s internal logic. A responsible AI agenda must therefore pursue two complementary pillars:
-
Output governance, focused on existing AI governance concerns such as model safety, evaluation, alignment, and transparency.
-
Input governance is based on more traditional data governance concerns, such as stewardship and benefit sharing.
These are not competing approaches; they are mutually reinforcing, part of the same stream.
Strong data governance makes AI safety easier, more effective, more credible, and trustworthy.
AI governance efforts can strengthen and extend longstanding practices in data protection and responsible data management. Seen this way, governing AI well means governing data well and vice versa. Integrating the two is not an expansion of the governance agenda but a return to first principles: ensuring that the inputs of AI are as responsibly managed as its outputs.
Next Steps
What does this mean in practice? Concretely, integrating data and AI governance requires a vigorous commitment to collaboration and integration across all AI building blocks, workstreams, institutions, and individuals, including various actions across the policymaking and regulatory ecosystem. Among other steps, this could imply that:
-
Data and AI governance must be part of the same conversation. At AI summits, data governance must be central to the agenda, and the UN mechanisms for global AI governance must be linked to the UN conversation on international data governance.
-
Data governance must receive visibility and high-level participation to meet the political commitments. This requires putting in place the policies and institutions that support and formalize data stewardship and data commons approaches. This is critical for protecting rights and for the equitable distribution of data and AI benefits in the service of the public good.
-
Institutions and other structures for data and AI governance should examine their mandates. This will determine opportunities to support across the data and AI governance spectrum: data protection authorities, for example, might integrate AI training pipelines into their oversight and auditing mandate, and, conversely, AI safety institutes might integrate data governance responsibilities into their mandate.
-
Competitions and digital markets regulators should treat access to high-quality data as a structural condition for fair AI markets. And a condition for sustainable digital development and future economies.
-
National statistical offices and research agencies should build and maintain high-quality, high-trust public datasets to serve the public interest (a type of data commons).
-
Multilateral institutions should invest in identifying critical data commons that lower barriers to the development of a digital economy in LDC countries. As well as data commons serving key public good purposes (e.g., climate).
-
The technical and open-source communities should invest in developing modular, open-source infrastructure. This can support (collectively) governed data exchanges and reduce dependence on a handful of compute infrastructure providers.
The task at hand is to build an upstream AI governance architecture focused on data commons, data stewardship frameworks, and well-governed, high-quality datasets to enable public good and equitable distribution of data and AI benefits.
Oversight of the data ecosystem is one of the clearest levers we have for shaping AI in the public interest. It is a strategy with immediate benefits and long-term structural impact.
Image by Janet Turra & Cambridge Diversity Fund, via Better Images of AI, licensed CC BY 4.0. Changes: colors altered; shape adjusted; mirrored and duplicated.