Research Radar: “Unboxing the Prompt”: How Community Feedback (and AI) Helped Us Build Better AI Together

Most AI tools are built behind closed doors. The rules that shape their behavior usually live in complex code, hidden prompts, or opaque models that users never see. But on the AIEP project, we took a different approach. We decided to make those hidden rules transparent and invited the families who actually need the tool to help us draft them.

We call this approach “Unboxing the Prompt.”

More than 7.5 million children with disabilities in the U.S are eligible to receive special education services; that’s nearly 15% of all students, or roughly 1 in every 7 kids. For each of them, an Individualized Education Program (IEP) determines what support they will receive in school, such as speech or occupational therapy, extra time on tests, or help from a classroom aide..

But an IEP can run between 50 to 100 pages of specialized terminology, dense tables, and legalese that even native English speakers struggle to parse. For families where English isn't the primary language, the challenge of understanding, let alone advocating for their child, is even greater.

Miss a detail, misunderstand a term, and their family might lose out on a life-altering service.

That is the problem AIEP exists to tackle.

Reboot Article Banner (10)

The AIEP Landing Page

As the lead engineer on the AIEP project, I, along with my colleagues in the AI for Impact initiative, set out to help parents go beyond making IEPs more intelligible. We wanted to help parents become participants in their child’s educational journey.

This post describes how we built that tool by grounding technical decisions in parent feedback and using AI to prototype quickly.

For me, this was a shift in mindset: I learned that building "for" users wasn't enough; to get this right, we had to build "with" them.

The first prototype worked, and it failed

In January 2025, our architecture was simple: take the PDF of an IEP and send it to an LLM with one instruction, “Please summarize this document.” It worked in the way prototypes often work. It produced clean paragraphs. It sounded confident. It was fast. Then we put it in front of people who live with IEPs in the real world. The feedback was immediate: a generic summary is not what parents need. Parents are not reading an IEP for fun. They are trying to answer specific questions, often under time pressure:

How many minutes of services does my child get each week, and in what setting?
What accommodations are guaranteed?
What goals is the school committing to work on?
Who are the key people on the team?
What placement was decided, and what does it actually mean?

Those answers exist in the document, but they are buried deep. Sometimes in a dense table. Sometimes written in a way that assumes you already speak the system’s language.

That feedback led us to abandon the “one prompt” approach. It also gave us our core design principle:

Although AI is good at summarizing, we learned that the product families want is not an AI summary but access to the specific, legally-meaningful details parents need to advocate for their child.

Everything that followed came from that insight.

Reboot Article Banner (7)

Early Prototypes of the AIEP Summary pages

Screenshot 2026 02 02 at 11.51.31 Pm

New summary pages in the AIEP Application

Reading an IEP

IEPs are rarely clean PDFs. Many are scanned. Many have complex tables, checkboxes, and inconsistent formatting. Before we could reliably extract anything, we had to solve a less glamorous problem: converting messy, real-world documents into something an LLM could read.

We tested and compared several Optical Character Recognition (OCR) tools. OCR is the process that converts an image or scanned PDF into text. Many tools returned output that looked fine at a glance but “broke” under pressure: scrambled tables, lost headers, and missing structure that made downstream extraction unreliable.

What changed things for us was using a vision-based OCR approach (we used Mistral OCR). Instead of only reading characters, it preserved layout and structure well enough to produce clean, well-structured Markdown, which helped us reliably extract the source text.

Privacy by Design

An IEP contains highly sensitive Personally Identifiable Information (PII) like names, phone numbers, SSNs, and addresses. From the very beginning, parents expressed deep concern about privacy, so we were keen to build it into the system's design.

We integrated AWS Comprehend, a machine learning service that analyzes text to identify specific types of information, as a dedicated redaction layer. Before any extracted text is used for analysis, it is processed to detect and redact PII, such as SSNs, addresses, and contact information. Immediately after processing, we delete the IEP PDF from our system and retain only the redacted output, so a parent can log in later and still access their results. Parents can also delete even that minimal information with a single button click in the app.

A summary is helpful, but sections make it usable

To extract the specific information, caregivers told us what they needed, so we created an initial organizing taxonomy: Goals, Services, and Accommodations. Then, parents and special education experts told us they wanted:

Present Levels, the section that describes how a student is doing right now across academic, social-emotional, and functional skills
Placement details, which explain where instruction will happen and what setting the student will learn in
Key People, which lists the team members involved and who is responsible for what

Over time, we evolved the organizing scheme to identify ten distinct sections, which matched how parents actually use IEPs.

This is how the “feedback to feature” became a repeatable loop:

A parent points to something confusing or hard to find
We translate that into an extraction requirement
We implement it as a prompt rule, a data structure, and a UI output
We bring it back and ask, “Does this actually help?”

But as we added more sections, reliability became the next challenge.

To pull out specific, legally meaningful details from an IEP, we had to run the document through several AI steps in sequence. That made the system fragile: responses could time out, return incomplete, or change subtly. If any one step failed, the entire process stopped.

To make the system more robust, we used AWS Step Functions, a service that lets us coordinate multiple tasks into a visual workflow. It managed the sequence of AI tasks, resulting in predictable retries if an error occurred and a clear view of exactly how the data was being processed.

“Unboxing the prompt” changed who could help us build

The biggest shift in the entire project did not come from a new LLM or a clever architecture change: It came from listening to parents.

In focus groups, caregivers, special education experts, and educators offered very specific feedback on what was unclear, missing, or could be improved. Acting on that feedback often meant changing the instructions we gave the AI for each section.

Because those instructions were written in plain English, we made a simple decision: we showed them directly to the people using the product.

That moment is what we now call Unboxing the Prompt.

Three years ago, doing this would have been unrealistic. If you were building a traditional machine-learning system, the “core logic” would be buried in training pipelines, feature engineering, and complex code. Feedback from non-technical users would have to be translated by engineers into technical changes, with lots of interpretation in between.

But here, the core of the product is plain-language instructions. The rules the AI follows are readable. That means our users can critique them directly, without needing to become developers.

And they did.

Two examples that changed the product:

1) The “service minutes” logic
An IEP might say: “150 minutes per week.” Parents told us that the number is hard to visualize. Is that daily? Weekly? Is it broken up over several days, or is it in a single session?

So we changed the prompt so the tool presents the breakdown in human terms, for example: “2 hours and 30 minutes per week, about 30 minutes per school day.” It is a small change technically. It is a big change in usability.

2) A strengths-first experience
Parents pointed out something deeper: many IEPs are overly deficit-focused. Even when a child’s strengths are present, they are not foregrounded.

So we added a dedicated “Student Strengths” section and adjusted the order and tone so the output begins with what the child does well, before moving into needs and services. That changed the emotional experience of reading the output, not just the content.

Unboxing the prompt lets parents become co-designers of the system’s core logic.

That is not a nice-to-have. It is a shift in what building products can look like.

Translation that preserves meaning, not just words

For many families whose home language isn’t English, English-only documents can be a barrier to participation and advocating for their child’s education. But translating an IEP is not like translating casual text. Terms like “placement,” “accommodation,” and “service minutes” carry legal and cultural meaning, and a literal translation can be misleading.

So we built translation as its own step, tailored to each family’s language, and anchored it in a carefully curated set of IEP terms.

Our tool currently supports Spanish, Vietnamese, and Chinese, the most commonly spoken languages other than English in San Francisco, where we piloted AIEP. To ensure complex terminology remained accurate, human translators manually translated 300+ high-impact IEP terms for each supported language, creating a term bank that the AI uses as guidance during translation. The goal is not just “words in another language,” but an output that stays faithful to meaning and intent.

We also built a glossary to support understanding while reading. In the design, complex IEP terms are highlighted in blue with an underline, and clicking a term reveals a plain-language definition that explains what it means in practice. More than once, I found myself clicking on a term to learn what it meant.

IEPs are packed with acronyms and abbreviations that are second nature to school systems but foreign to families. We added a dedicated Abbreviations section that collects and expands the shorthand used in the document, such as IEP (Individualized Education Program), OT (Occupational Therapy), SLP (Speech-Language Pathologist), and FAPE (Free Appropriate Public Education).

AI changed how quickly we could learn and build

For community-engaged AI to work well, the feedback loop must be short enough that people see their input reflected back to them. AI tools changed the pace of prototyping for us.

We could turn workshop feedback into a prompt revision quickly
We could generate a prototype output, bring it back, and ask, “Is this clearer?”
We could iterate repeatedly over the course of a year, instead of shipping one big “version 1” months later

That speed is what made unboxing the prompt meaningful. It kept parents engaged throughout the development process, not just at the beginning. By December 2025, the tool had grown from a rough prototype into a much more polished product, shaped by repeated feedback and refinement.

To learn more about the AIEP project: When Communities Lead, Appropriate Tech and Change Follow and Designing AI With Communities: the AIEP Project

The first prototype worked, and it failed

Reading an IEP

Privacy by Design

A summary is helpful, but sections make it usable

“Unboxing the prompt” changed who could help us build

Translation that preserves meaning, not just words

AI changed how quickly we could learn and build

Tags

Solving Public Problems with Artificial Intelligence

Research Radar: Co-Designing AI Systems

Research Radar: Race, Democracy, and AI - Spencer Overton Offers a Framework for a More Inclusive Digital Future