Loading blog post, please wait Loading blog post...

Experimentation as Public Infrastructure

A New Moment for Public-Sector Learning

We are at a frontier moment in how governments encounter technology.

For the first time in a long time, state and local teams—technical and non-technical alike—can engage with powerful tools early. Advances in artificial intelligence (AI) have lowered the barrier to experimentation in ways that would have been unthinkable even a few years ago. Governments no longer need to wait for a fully built system to emerge at the end of a long procurement cycle before they can see, test, or understand what a technology actually does.

This is a meaningful shift. It puts governments closer to the driver’s seat than they have traditionally been and creates a real opportunity to shape how emerging technologies are adopted, governed, and used in public systems.

Why Learning Breaks Down When the Stakes Are High

Realizing this opportunity requires more than access to tools. It requires space and resources to learn what works in real contexts. That means running small-scale pilots where teams can test ideas, surface risks, and even fail—without high stakes or lasting consequences. Too often, the institutional conditions that make this kind of learning possible are still missing.

Government cannot responsibly procure or govern AI-enabled systems if it does not understand their strengths, limitations, and tradeoffs.

Public systems that deliver essential services are designed, understandably, for consistency and compliance. When failure carries real consequences for families, staff, and public trust, avoiding visible mistakes can feel like the only responsible option. As a result, experimentation is often treated as risky or out of bounds, even as leaders are asked to make decisions in areas with little precedent.

The need for government to learn, however, has not diminished. It has intensified.

States are being offered a wide range of promises about what AI can do: reduce administrative burden, improve accuracy, speed up processing, and support overextended staff. Some of these promises are real. Some are not.

When AI systems are implemented prematurely, the consequences extend well beyond inefficiency. At best, they add new burdens for staff. At worst, they can cause real harm, wrongly denying people benefits they are eligible for, erroneously flagging individuals for fraud, or exposing sensitive personal data through insecure systems.

Government cannot responsibly procure or govern AI-enabled systems if it does not understand their strengths, limitations, and tradeoffs.

That kind of discernment cannot be developed through vendor demos or slide decks alone. It comes from applied experimentation.

Yet government is not structured to support this kind of learning. Budgeting, procurement, and oversight processes are optimized for large, defined purchases—not for iterative discovery. Culture reinforces this dynamic. Teams are often rewarded for avoiding visible risk rather than surfacing uncertainty early. Experimentation is treated as dangerous, while buying and deploying poorly understood systems at scale is normalized.

In reality, the opposite is true.

I learned this lesson firsthand during my time working in Vermont state government. I had a front-row seat to the failure of our health insurance exchange in 2013 and later led its turnaround. We made mistakes that now seem obvious in hindsight. We took on too much at once. We failed to center real users. We didn’t fully understand the technology we were buying, and we lacked ownership or access to the underlying code. We believed that handing responsibility to a single large vendor would outsource risk.

If we want systems that don’t fail at scale, we need places where it is actually safe for government to fail small.

But when the system failed, the risk tolerance snapped back to where it had always been. State leaders were held accountable by the legislature, the public, and the people who depended on the system. The lesson was painful, but clear. 

When Risk Can’t Be Outsourced

When it comes to running public systems, risk cannot be outsourced.

If we want systems that don’t fail at scale, we need places where it is actually safe for government to fail small.

This requires both dedicated resources and a shift in mindset. There must be environments where learning is a primary goal, not a secondary byproduct of implementation. This means creating places where teams can test emerging technologies against real problems, within real constraints, without putting essential services or public trust on the line. These environments must be bounded, responsible, and designed to surface risks early, when there are still options.

This is what we mean by experimentation as public infrastructure.

AI is the forcing function right now, but it will not be the last. New technologies will continue to move faster than public systems are built to absorb them. If governments are expected to adapt responsibly, they need modern public-sector R&D capacity: shared environments that reduce duplication across states, build collective understanding, and strengthen leaders’ ability to make informed decisions before those decisions become irreversible.

At the Center for Civic Futures, this is the role we designed the Public Benefit Innovation Fund (PBIF) to play.

PBIF functions as a living lab for public benefit systems—a controlled environment where states can work with builders to test promising ideas, generate real-world learning, and share those lessons with the broader field. Success is not defined only by what scales, but by what we learn, how early we learn it, and how that learning improves future decisions.

The demand for this kind of infrastructure has been unmistakable. In our first open call, we received more than 450 proposals from teams working across 45 states. What stood out was not just the volume of interest, but the maturity of the ideas. Many were grounded in real operational challenges and ready for disciplined experimentation.

With the support of our anchor funders and partners, we are now investing more than $8 million in a first cohort of projects designed to generate that learning. In this initial open call, we did not focus on moonshots (although we may do so in the future). Each project has moved beyond the idea stage, with a clear understanding of the problem it aims to solve and a deliberate plan for testing in ways that are bounded, responsible, and focused on producing usable insight for the field.

Much of this work targets the most complex and consequential parts of public benefit delivery. Some projects focus on eligibility and verification for programs like Supplemental Nutrition Assistance Program (SNAP) and Medicaid, where administrative burden is high and even modest improvements can have outsized impact. In one multi-state effort anchored by Maryland, partners are building open-source AI tools to streamline work verification so other states can adopt, adapt, and improve them over time.

Other projects focus on the front door. This includes helping residents navigate complicated housing and benefits applications by translating dense rules into clearer steps and reducing the back-and-forth that slows access to support. Still others look inward, testing tools that help agency staff with training, policy navigation, and decision support so teams can apply rules more consistently and manage growing caseloads with less strain.

Some of these projects may ultimately scale. Others may not. Either way, they will produce learning that matters in environments where the stakes are high and the margin for error is small.

The Case for Experimentation as Infrastructure

This is why we believe that safe, structured spaces for experimentation are a form of public infrastructure.

Experimentation spaces lower long-term risk, strengthen public institutions, and give government leaders the tools they need to navigate change with confidence rather than react under pressure.

That is the work we are committed to at the Center for Civic Futures. While this effort is still in its early days, clear lessons are already coming into focus. As this cohort moves from pilot to proof, we are committed to sharing what we learn—what works, what proves harder than expected, and what responsible use of AI in real public systems actually requires.

If this resonates, we invite you to follow along and stay connected. We’ll be sharing additional insights, tools, and case studies in the months ahead as this work continues.

Tags