Why “Good Guys” Shouldn’t Use AI like the “Bad Guys”: The Failure of Predictive Policing

Author Note: This essay is adapted from remarks delivered at a recent webinar with InnovateUS on algorithmic bias in policing.

Criminals and other bad actors use artificial intelligence (AI) tools to target vulnerable communities. This naturally prompts the question: “If the bad guys are using predictive AI tools to find holes in our defenses, why can’t the good guys do the same?” Shouldn't law enforcement utilize the same computational power to stay ahead? But the answer—building on Professor Ferguson’s foundational 2017 critique “Policing Predictive Policing”— reveals why predictive policing systems continue to fail despite millions of dollars in investment, and why police departments keep buying systems that demonstrably don’t work.

Two fundamental asymmetries make the comparison flawed. First, criminals operate outside the law, while police must act with justification and be held publicly accountable for their actions. Second, criminals only need to be right a few times. Police interventions operate under different constraints. A robocaller doesn’t care if you send the call to voicemail; it just needs a few people to answer.

A false positive in predictive policing means armed officers appearing in your neighborhood, your name added to a watchlist, or a stop based on algorithmic suspicion rather than observed behavior.

Low-Stakes Algorithms in High-Stakes Contexts

The core issue is that the AI systems we encounter daily were designed for contexts where being wrong has trivial consequences. Netflix recommends a terrible movie, and you waste a few minutes before scrolling to something else. Google surfaces an irrelevant search result, and you click back to try different keywords. TikTok shows content you’re not interested in and you swipe past it. Amazon suggests products you don’t want, and you ignore them.

These systems were designed to optimize for engagement, not accuracy. For speed, not accountability. For “good enough,” not “probable cause.” They tolerate high error rates because the cost of being wrong is negligible—a few seconds of annoyance before moving on.

It is very different when police departments consider buying these same systems built on similar logic, and applying them to the highest-stakes decisions imaginable: who gets surveilled, who gets stopped, who gets flagged as a potential criminal, who gets added to watchlists that follow them for years. The vendors selling predictive policing software come directly from this low-stakes world.

PredPol was inspired by algorithms predicting earthquake aftershocks—a scientific model for natural phenomena, repurposed to predict human behavior. Palantir built data mining tools for law enforcement that were designed to mimic those used in Silicon Valley.

In effect, these companies are selling recommendation engines to police departments, repackaging the logic of “you might also like this product” as “this person might commit a crime.” The results are exactly what you would expect when you use a content recommendation engine to decide who gets a knock on their door from armed officers.

What Happens When We Ignore the Asymmetries

The Markup’s analysis of a predictive policing system in Plainfield, New Jersey, is a stark example of the cost of predictive policing systems. The study examined 23,631 crime predictions generated by Geolitica (formerly PredPol) for the Plainfield Police Department between February and December 2018. The success rate was less than 0.5%. Fewer than 100 predictions matched crimes later reported to the police.

For policing, it means nearly 24,000 predictions sent officers to locations where no predicted crime occurred—adding police presence based on algorithmic guesswork that proved wrong more than 99 times out of 100.

Chicago’s Strategic Subject List is another example of a failed system. First deployed in 2012, the SSL used an algorithm to assign risk scores to individuals most likely to be involved in gun violence, as either perpetrators or victims. The list eventually ballooned to over 400,000 people, including 56% of Black men in the city. The program was discontinued in 2019 after the Chicago Police Department’s Inspector General raised concerns about its efficacy, but only after years of treating nearly half a million people as algorithmically suspect. The Chicago SSL illustrates both asymmetries perfectly: secret scores that individuals couldn’t contest or even know about (accountability failure), and a system that flagged hundreds of thousands of people on the theory that catching a few justified surveilling everyone (error tolerance failure).

Prediction Versus Diagnosis

The vendors and departments deploying predictive policing systems make a category error. They conflate two completely different uses of data: predictive and diagnostic. Predictive use refers to utilizing data to forecast future events, including who is likely to commit crimes, where future incidents are likely to occur, and which individuals present an elevated risk. Diagnostic use refers to the application of data to understand what has already happened—where patterns of harm are present, which environmental factors correlate with crime, and what the actual outcomes of interventions were.

The distinction matters because predictive use depends on a core assumption: that patterns observed in historical data will repeat in predictable ways at the individual level, allowing us to forecast individual actions. All available science indicates that attempting to forecast individual actions is extremely challenging, even with rich data and sophisticated algorithms.

For example, my colleague Matt Salganik at Princeton conducted the Fragile Families Challenge, inviting research teams from around the world to predict life outcomes for children using high-quality data on thousands of families—about 13,000 variables in total covering family income, parental education, neighborhood characteristics, and health records from birth through age nine. These teams used cutting-edge computer science methods combined with social science expertise. It turns out that the predictions were just not very good. With this experience, why do we believe a vendor can predict the much rarer, more complex event of a future crime?

There are several fundamental flaws with such predictive policing systems, as explained comprehensively by my colleagues in “Against Predictive Optimization.” Chief among them is that these systems optimize to match past police behavior, not identify actual danger. When these algorithms are built, they utilize historical data to train a model using features that may be predictive, such as residential address, zip code, time of day, prior arrests, and neighborhood characteristics. Then, they run regression models or more sophisticated machine learning algorithms to predict what arrests have happened in the past so that they can predict them in the future.

But the algorithm doesn’t know why these patterns exist. It just sees them. If police have historically made more stops or arrests in Black neighborhoods, the algorithm learns to predict more activity in Black communities. If young men are arrested more often, it understands that young men are at high risk. The model optimizes to match past police behavior, rather than identifying actual criminal activity.

What we think we’re measuring is criminal behavior. What we’re actually measuring is the frequency of police contact with individuals. We measure arrests, citations, and reported crimes. An arrest record doesn’t tell you where a crime happened—it tells you that police made that arrest. If police patrol a Black neighborhood more heavily, they'll probably make more stops there. More stops generate more data. More data makes the algorithm predict more police activity there, which leads to more patrols, which generates more stops. The feedback loop is self-reinforcing, but it does not predict crime any differently. It’s predicting and perpetuating patterns of police activity.

A Story About Surveillance and Accountability

Sarah Brayne, a sociologist who spent years embedded with the LAPD studying how they use big data and surveillance technologies, observed something revealing during a ride-along. She noticed an officer manually calling in his location on the laptop and asked why, given that the cars were equipped with GPS tracking. The officer explained that the automatic vehicle locators were turned off because the union wouldn’t accept them.

The officers understood that being tracked and having their movements monitored felt intrusive. They recognized it as surveillance. Because they had the power to refuse it, they did.

Now imagine a supervisor using an early intervention system to predict which officers are at elevated risk for misconduct. The system flags an officer for intervention. What questions would that officer ask? What did I do to get flagged? Can I see the data? What if there’s an error? Can I challenge it? Who else was flagged? Are there patterns? What happens if I refuse the intervention? Will this affect my career? Was this based on my actual behavior, or was I lumped in with others algorithmically? These are all legitimate questions. And they are exactly the same questions that community members ask when law enforcement puts in new surveillance systems. The critical difference is that officers often have institutional power to push back. Community members typically don’t.

What Actually Works: Using Data to Diagnose Problems in Oakland and Richmond

Using data for diagnosis offers a constructive alternative. When police departments use data diagnostically, they can identify locations with repeated 911 calls and investigate what’s happening there that might be changed, track their own enforcement patterns to detect racial disparities in stops and searches, measure whether interventions actually reduce harm rather than just generate arrests, and hold themselves accountable to communities by reporting transparent metrics.

These diagnostic approaches embrace both asymmetries—they demand accountability by showing what police actually did rather than what an algorithm predicted, and they minimize error by measuring real outcomes rather than guessing at futures.

Oakland’s Ceasefire program demonstrates the diagnostic approach in practice. The program analyzed shooting data to understand who was involved in gun violence and found that roughly 400 people—0.1% of Oakland’s population—were at highest risk of being involved in shootings. But the intervention wasn’t surveillance. Instead, the program provided guided outreach, job training, mentoring, and other support services. Analysis by Northeastern University showed a 31.5% reduction in gun homicides and a 52% drop in shootings with victims. The program succeeded not because it tried to predict who would be a bad actor and prevent individual acts, but because it used data to understand the problem and structure an intervention that addressed root causes.

Richmond’s Office of Neighborhood Safety took a similar diagnostic approach. They identified a small group disproportionately involved in firearm activity and provided intensive non-enforcement outreach through the Peacemaker Fellowship—stipends, mentorship, life coaching. The program showed significant drops in firearm deaths and hospital visits. If you contrast these methods with Chicago’s Strategic Subject List, the difference becomes clear.

Oakland and Richmond used data as a mirror to guide engagement. Chicago tried to use it as a crystal ball to justify surveillance.

A Framework for Evaluating Predictive Systems

When vendors claim to have solved the problems of predictive policing, there are specific questions that procurement officers and police leadership should ask. What exactly is being predicted? Don’t accept vague terms like “risk of violence” or “criminal activity.” Push for specifics: Are you predicting arrests? Victimization? Perpetration? The distinction matters because it reveals whether the system is predicting crime or police activity. What are the false positive and false negative rates? How often does it flag someone who won’t actually commit a crime? How frequently does it miss people who will? Has this been independently evaluated through peer-reviewed research? Can the methodology be shared for external review?

Be cautious about vendors claiming their methodology is proprietary or can’t be shared for security reasons. Without the ability to verify independently, you’re putting substantial weight on vendor claims. How will the intervention change underlying conduct? Will an increased police presence lead to more arrests, which in turn feed back into the model, making validation a self-fulfilling prophecy? Does the training data match your context? A model trained in Los Angeles may not transfer to suburban New Jersey. Crime patterns differ. Police practices differ. The model’s assumptions must align with your operational reality. Finally, ask how people can contest decisions. Can they learn they’ve been selected? Can they see what data drove the decision? Is there a process to correct errors?

Why Police Departments Keep Buying Systems That Don’t Work

The persistence of predictive policing, despite consistent evidence of its failure, requires explanation. Part of the answer lies in vendors who exploit the seductive appeal of a simple technological “fix.” Department leadership faces intense political pressure to respond to crime with visible action. There’s also an asymmetry in political consequences. When someone is released and commits a crime, that becomes a visible failure that generates intense scrutiny. When hundreds or thousands of people are subjected to unjustified surveillance because an algorithm flagged them, those harms are more diffuse and less politically salient. This creates natural pressure to accept more false positives to avoid false negatives, even when the false positive rate reaches 99%.

The feedback loops in these systems also obscure their failures. When police go to neighborhoods the algorithm predicts will have crime, they make arrests. Those arrests generate data that validates the algorithm’s predictions. The system appears to work because it’s being measured by its ability to match police activity patterns, not its ability to identify actual crime. Vendors can claim high accuracy rates because they’re measuring the wrong thing—how well the system reproduces biased training data rather than how well it predicts actual criminal activity.

Conclusion: The Path Forward

The failure of predictive policing is not a failure of data. It’s a failure in how we’ve chosen to use data.

Data can function as a mirror that shows what’s happening, reveals patterns and disparities, helps hold systems accountable, and enables learning. Or data can be used to create a crystal ball that claims to predict the future, promises capabilities it cannot deliver, and leads departments down paths that waste money, create legal liability, damage community trust, and reinforce existing biases.

What is needed are better mirrors—data systems that show what is actually working, who is being affected, how outcomes can be improved, how to address root problems, and how to utilize technological capabilities; to make policing more efficient, effective, and responsive to underlying community concerns.

The choice isn’t between using data and not using data. The choice is between diagnostic systems that enable accountability and learning, and predictive systems that promise the impossible while perpetuating the patterns they claim to disrupt. We should not continue funding algorithmic fortune-telling that harms communities while failing to prevent crime. There are alternatives. The question is whether we have the courage to choose accuracy and accountability over the seductive promise of prediction.

I thank Sayash Kapoor, Angelina Wang, Solon Barocas, and Arvind Narayanan for numerous discussions and several joint presentations on the policy implications of their "Against Predictive Optimization" paper. Working with them has significantly informed my understanding of the technical flaws in predictive policing.

Low-Stakes Algorithms in High-Stakes Contexts

What Happens When We Ignore the Asymmetries

Prediction Versus Diagnosis

A Story About Surveillance and Accountability

What Actually Works: Using Data to Diagnose Problems in Oakland and Richmond

A Framework for Evaluating Predictive Systems

Why Police Departments Keep Buying Systems That Don’t Work

Conclusion: The Path Forward

Tags

Join People Powered on September 16 for the Release of New Guidance on AI for Digital Democracy

Coming Soon: InnovateUS to Offer Training on Responsible AI for Public Sector Legal Professionals

Bringing Citizens into the Courtroom: How Digital Technologies Can Democratize Constitutional Justice