In January, my family and I returned from Germany, landing at LAX after an 11-hour flight. At passport control, I watched travelers glance into a mounted screen and either get waved through or sent to an officer. When it was our turn—face to camera, click, green checkmark—we were cleared in seconds.
Behind that screen is a computer-vision system that compares travelers’ photos against images already held by U.S. Customs and Border Protection to verify identity. It felt like a straightforward example of AI improving government efficiency. But InnovateUS's workshop, Prediction Isn’t Intelligence: How Predictive Models Really Work in Government, caused me to rethink why some uses of AI are risky and others are inherently more risky.
The workshop’s core point is simple: identity matching can be tractable; predicting human behavior is far less so, and the costs to government and the public of getting it wrong are high.
Arvind Narayanan, Professor at Princeton University & Director of the Center for Information Technology Policy argues that predictive AI has an inherent ceiling: “The future has not been determined yet… It doesn’t matter how much data you throw at the problem… predictive AI has inherent limits to accuracy.” He contrasts this with generative AI, where capabilities can keep improving as models get better. Prediction isn’t useless, but it is uncertain—and that uncertainty becomes dangerous when probabilities quietly turn into decisions.
One case study that makes this concrete comes from hiring. Some tools claim they can infer a candidate’s personality—the Big Five/OCEAN traits (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism)—from a short interview video. A journalistic investigation by German public broadcaster BR tested one such system and showed how fragile these “predictions” can be: the same person gives the same answers, yet scores shift with small changes—like a plain wall versus a bookshelf, glasses, a headscarf, or different lighting.
Small changes in a video flipped “personality” scores. Source: Bavarian Broadcasting
That’s the warning sign. If a “personality” score moves because of a background or accessories, the model is likely reacting to shortcuts, not stable traits. It may look precise, but it’s easy to game and can punish superficial context instead of job-related ability.
“The real innovation is often less the technology itself and more the workflows we design to integrate that technology into existing processes.” - Arvind Narayanan
Rebecca Cai, Chief Data Officer of the State of Hawaii, offers a different angle: one where predictive models can help when paired with strong governance. She describes Hawaii’s climate and disaster planning—using modeling and “digital twin” approaches to understand tsunami risk, sea-level rise, and how infrastructure (like drainage systems) behaves under stress. Predictive modeling can be valuable here—especially for planning—but only if it’s explainable, stable, and auditable, because decisions affect real lives.
During the workshop, several risks recurred across examples:
-
AI can be misused to harm the people it’s meant to help.
-
Due process gets strained when models influence benefits, enforcement, or investigations.
-
Feedback loops can become self-reinforcing when predictions shape the very data that trains the next model.
-
And benchmark scores can hide the most important failures.
The benchmark-and-performance problem shows up in one of the most striking examples Narayanan shares and that highlights why evaluation is tricky.
Researchers built a model to predict complications for hospitalized pneumonia patients. The model appeared to show that asthmatic patients were less likely to have complications—a pattern that was genuinely present in the data. In reality, asthma is a major risk factor. What the model absorbed instead was a healthcare-system artifact: asthma patients are triaged as high-risk and typically receive faster, more intensive care, which lowers observed complication rates. Catching failures like this requires domain expertise and critical thinking, not just strong benchmark scores.
Cai underlines why getting AI right turns out to be challenging in government: data is often “collected for compliance, not for analytics,” scattered across silos, departments, and programs.
To mitigate the risks, she offers practical data governance advice:
-
know what your data represents;
-
watch for bias and “data pollution”;
-
build feedback loops so systems improve under oversight.
In response to the question: when AI makes a mistake, who is responsible? Cai’s simple answer:
“... you can’t blame the machine… Responsibility stays with humans, because we’re augmenting processes—not replacing them.”
“Governance, governance, governance. … Focus on the public impact and the business problem—not the technology itself—and don’t trust the vendor 100%.” - Rebecca Cai
Narayanan adds a concrete rule of thumb for buyers: be skeptical of “it worked elsewhere.” His one-word recommendation is “pilots”—test on your population, in your conditions, inside your workflow, and ask where training data came from and who it represents.
Which brings me back to the green checkmark at LAX. That moment felt straightforward—almost automatic. But the workshop reframed what “straightforward” really means. The risk isn’t only the model; it’s how the system is used, what happens when it’s wrong, whether there’s recourse, and whether humans are verifying—or just rubber-stamping.
If you want the full set of case studies—and a clearer way to spot these risks before they show up in the real world—watch the full InnovateUS workshop.
If you want to know more about predictive optimization, follow Prof. Arvind Narayanan’s project Against Predictive Optimization or read his Substack.
This workshop is part of InnovateUS’s “Prediction, Automation, and Decision-Making with AI” series, co-hosted with the Princeton Center for Information Technology Policy.