Research Radar: The Emperor's New Agents - Why AI Won't Fix Broken Government

Jennifer Pahlka's Recoding America captures what anyone who's worked in government can tell you: sometimes getting anything done feels like trying to “carve Mount Rushmore with a teaspoon.” We too often celebrate policy over implementation and compliance over practical effectiveness.

An expanded, new version of a report, The Agentic State offers a compelling vision for how AI agents could help solve this implementation crisis. It's worth reading in full—the authors are some of the world's most accomplished digital government practitioners, and their systematic thinking about how AI could transform public services represents important work.

It's worth reading in full—the authors are some of the world's most accomplished digital government practitioners, and their systematic thinking about how AI could transform public services represents important work.

That said, I fear the prescription is premature. The problems they point to are serious. The right tools, carefully applied, can help governments improve operations. However, so many of the problems the report cannot identify can be solved without agentic AI. The dangers associated with deploying unregulated and immature technologies at scale are considerable.

Before we embrace speculative AI solutions from unaccountable commercial vendors, we should ask and answer: can't we do all of this now?

Kudos

The report's "Where We Are Now" sections are devastating and accurate. They describe government services as fragmented experiences that force citizens to understand and navigate bureaucratic logic rather than simply getting help. They describe workflows as “digitised paper trails riddled with bottlenecks” rather than processes genuinely designed to serve citizens. They describe procurement taking "months of paperwork and negotiation rounds" when it should take days or weeks. They describe policy-making as episodic and reactive rather than continuous and evidence-based.

Every government watcher will recognize these problems. They're real, they're urgent, and they genuinely harm people. When the report writes that “outdated workflows hard-code delay and inefficiency into everything the state does, leaving governments unable to govern effectively,” they're absolutely right. This isn't abstract inefficiency—it's people waiting months for benefits they're entitled to, businesses giving up on permits they need, and governments unable to respond at the speed modern problems require. We're seeing the consequences right now with the inability to react to changes in Medicare and SNAP.

Many of the problems identified in the report can all be solved without agentic AI, and the dangers associated with deploying unregulated and immature technologies at scale are considerable.

The Vision is Inspiring

The report imagines a world in which governments anticipate citizen needs rather than waiting for the public to fill out burdensome paperwork, coordinate seamlessly across agencies without forcing citizens to navigate silos, adapt policies continuously based on real-world evidence, and provide truly personalized services through voice, text, or emerging interfaces.

When they describe services that "flow directly around people's lives" or procurement systems that "negotiate continuously" rather than through annual tenders, they're pointing toward something genuinely better than what we have now. And the authors are right that AI could play a role in making some of this possible.

When the report describes specific use cases, they're often genuinely compelling (albeit they might not all be agentic AI):

Voice-first information services for populations with limited literacy or smartphone access (like Zambia's pilot, claiming 95% successful response rates)
Proactive benefit notifications where the government detects eligibility and contacts citizens rather than forcing them to discover programs themselves
Real-time compliance monitoring that catches problems before they cascade, rather than discovering violations months later through periodic audits
Crisis response systems that coordinate across agencies at machine speed rather than through phone trees and email chains

These point toward realistic possibilities for what we should aspire to and make clear that, done right, less wealthy countries could reap those benefits just as much as richer countries, despite lower smartphone penetration, higher illiteracy rates, and weaker infrastructure.

This is exactly right. The report deserves credit for thinking systemically rather than offering a simple “add AI and stir” prescription.

The Framework is Useful

Their 12-layer model—splitting implementation layers (where agents deliver visible value) from enablement layers (foundational requirements like governance, data, security, and culture)—is genuinely helpful for thinking about government transformation. It makes visible something often missed in technology discussions: you can't just add agents to existing systems and expect them to work. You need coordinated progress across governance frameworks, data infrastructure, technical architecture, cybersecurity, public finance models, and organizational culture.

This is exactly right. The report deserves credit for thinking systemically rather than offering a simple “add AI and stir” prescription.

The Risks

The authors acknowledge that "the deployment of agentic AI will come with inevitable risks and setbacks. Not everything will work." These warnings are important.

My critique isn’t that the authors are blind to risks—they don’t follow through on what these admissions should mean for their prescriptions. To their credit, they devote an entire enablement layer to “Agent Governance: Accountability, Safety, and Redress,” calling for real-time oversight, explainability, and built-in procedural fairness. These ideas are important and welcome. However, the report treats them as design features that can be engineered into systems, rather than as institutional practices that must be continuously contested, enforced, and resourced.

My critique isn’t that the authors are blind to risks—it's that they don’t follow through on what these admissions should mean for their prescriptions.

Given the serious risks of automation and the difficulties of managing the risks they identify, we need to ask why conventional approaches won't work and why we really need AI agents.

For example: Citizens "experience government as a collection of separate departments" requiring them to navigate bureaucratic silos and repeat information across agencies. The report's solution: AI agents that orchestrate across departments, self-compose complete solutions, and provide personalized service.

But, as they point out, the key to solving the silo problem is to get rid of the silos.

Estonia solved this with X-Road, a data exchange layer that connects agencies and implements a "once-only" principle—citizens provide information once, and agencies share it securely. The UK solved it with GOV.UK's unified interface and shared platforms. Singapore solved it with whole-of-government architecture. In New Jersey, we’ve tackled this problem with our Business Experience project that unifies all the services a business needs in one place.

None required agentic AI—they required organizational commitment to interoperability and shared standards.

Before we embrace speculative AI solutions from unaccountable commercial vendors, we should ask and answer the question: can't we do all of this now?

The Evidence Gap

The authors deserve credit for attempting to ground their vision in evidence. They cite numerous examples from both the government and private sector. The report leans heavily on commercial deployments: Salesforce's Agentforce handles customer service, banking fraud detection systems, and Heathrow's customer service bot achieves 90% resolution rates.

But much private sector use of agents is hype. A much-hyped MIT study suggests that 95% of current corporate AI investments fail to pay off, giving rise to concern that, if we don't know yet how to use LLMs well, how are we going to use agents. In any case, private sector deployments operate under fundamentally different constraints:

Risk tolerance: A bad movie recommendation is an inconvenience. A wrongly denied benefit claim means someone loses housing.
Accountability standards: Private companies don't face due process requirements, administrative law constraints, or political oversight.
Consequences of failure: When commercial systems fail, it's bad for business. When government systems fail catastrophically, it destroys public trust for generations.

Tellingly, consider the one example where they provide honest follow-up: Klarna's AI handled 2.3 million customer conversations monthly. Then, in parentheses, they note: "When customer satisfaction declined, the company reintroduced human agents for complex cases." Over-automation degraded the quality of services.

In government, we can't just "reintroduce humans" after AI fails. The damage to trust and legitimacy would already be done. The government examples they give are all pilots, and most of them have lower levels of automation.

Critical Conflations

The report conflates several distinct things that are worth digging into and unpacking:

General AI Capabilities ≠ Agentic Readiness - While LLMs have improved at translation, drafting, text simplification and other single-step tasks, we are not very far along in multi-step autonomous reasoning or reliable, automated decision-making.

Late to Past Tech ≠ Early to Experimental Tech - The fact that governments were slow to embrace the internet, cloud, and mobile doesn't mean we should rush to embrace untested agents. They write: "Every prior delay in tech adoption has left public institutions weaker, costlier, less trusted." Not so: thank goodness the government didn’t buy everyone a Google Glass, start making Vines or move government agencies into Second Life. Problems have beset the early adoption of facial recognition. It's true that public institutions must urgently explore what's possible, but there's a crucial difference between testing and learning and large-scale adoption.

User Agents vs. Government Agents - This might be the most revealing conceptual confusion. The report describes agents “acting on behalf of the user” that “anticipate needs” and provide "personalized" services. This sounds wonderful—like having a personal assistant who advocates for you in dealing with bureaucracy. But then they propose that the government's “biggest contribution” is “enabling users to 'bring their own agent' from private sector providers. Having a personal assistant that navigates bureaucracy for you is a cool idea, but it's very different from government-side agents (what will actually be built) that optimize efficiency. These are fundamentally different things with opposite implications for equity and power.

Speed vs. Understanding - One of the report's proposed success metrics: "Time to launch new digital services: 1 day.” But one of the leaders they quote, Tamara Srzentić, warns: “Real change begins with a deep understanding of human problems.” You cannot deeply understand human problems in one day. Real service design requires weeks of user research with diverse populations. Writing code might represent 5% of this work. The rest is understanding humans and their needs. The "1 day to launch" metric reveals a belief that service design is primarily a coding challenge. This is precisely the tech-solutionism that has failed repeatedly in government.

Getting the Sequence Right

You can't automate your way out of organizational dysfunction. Deploying AI agents into broken bureaucracies doesn't fix them; it automates dysfunction at machine speed. Organizational change is the prerequisite for any technology to work. Estonia, Singapore, and other digital leaders succeeded by first implementing hard organizational reform, long before agentic AI existed.

Don't get me wrong: more than anyone, I believe introducing new tools can create the incentive for change. But with so much agentic AI snake oil being peddled, we must not mistake these powerful tools for a panacea that will fix what ails our institutions. The unglamorous work of organizational reform, interoperability, training, and human-centered design remains essential—and technical shortcuts cannot substitute.

Kudos

The Vision is Inspiring

The Framework is Useful

The Risks

The Evidence Gap

Critical Conflations

Getting the Sequence Right

Tags

DOGE Is Using AI To Centralize Government Power. It’s Time to Flip the Script.

Research Radar: Co-Designing AI Systems

Research Radar: Race, Democracy, and AI - Spencer Overton Offers a Framework for a More Inclusive Digital Future