Turning 20 Years of Community Board Data Into Searchable Public Knowledge

For years, the NYC Department of Transportation (DOT) has been running public meetings on a Canal Street Safety Improvement Plan following several major crashes and pedestrian fatalities.

Manhattan Community Board 3 (MCB3), the local community board governing part of Canal Street, has been discussing versions of the proposal for just as long.

Resolutions tied to “Reimagining Canal Street” stretch back to at least 2019.

But trying to figure out what the community board has said over time — and how it has engaged with DOT across years of meetings and resolutions — is much harder than it should be.

That difficulty is not just frustrating. It changes who can realistically participate in local government.

That difficulty is not just frustrating. It changes who can realistically participate in local government.

Across MCB3’s publicly available meeting notes dating back to January 2002, there are more than 250 PDF documents containing thousands of resolutions and nearly two million words of public discussion.

The (partial) list of MCB3 full board minutes, each saved as a separate PDF containing resolutions and other meeting text.

Tracking a single issue across those archives often means manually opening and searching dozens of disconnected PDFs. In practice, this makes the history of discussions on any topic by MCB3 inaccessible to all but the most determined citizens.

Even experienced community board members may not realize that years of institutional memory lie behind a current proposal. To connect the dots between meetings, residents and board members alike would have to manually open dozens of files with no reliable way to search across them.

Local government produces enormous amounts of public knowledge. Too often, that knowledge is technically public while remaining practically inaccessible.

Local government produces enormous amounts of public knowledge. Too often, that knowledge is technically public while remaining practically inaccessible.

That is the problem we wanted to solve with Block Party.

Introducing Block Party’s Resolution Archive

Searching the keywords “Canal Street redesign” in Block Party’s resolution archive immediately surfaces years of resolutions and debate, showing how long the proposal has been under discussion and what MCB3 previously requested from DOT.

Before this project, there was no centralized archive of resolutions, no search function across meeting notes, and no easy way to quickly understand what the board had previously said about a topic.

The archive also includes summarization tools designed to make procedural government language easier to navigate.

Chat GPT Image Jun 5, 2026, 11 35 03 Am

We created summaries from the full-resolution text to make the information more accessible.

Users can still access the complete resolution text through a toggle on each page. If they want additional context, they can open the original meeting PDF or navigate to the transcript archive to locate the recorded meeting discussion connected to a resolution.

The same resolution shown above, summarized by Block Party.

What emerged was a way to trace how public decisions evolve over time — across meetings, resolutions, negotiations, and institutional memory that had previously been buried in disconnected archives.

Paired with Block Party’s existing transcript archive of community board meetings uploaded to YouTube, years of public records and civic discussion are now searchable in one place.

See the CB resolutions knowledge base here!

When Building with AI, Harness Your Human Experience

We see AI as a powerful new tool to add to your toolbelt. But this project only worked because it combined AI capabilities with years of experience working with community board data and local civic engagement.

For this use case, we moved from idea to prototype to presentation at Open Data Week within weeks because we already understood the data, the workflows, and the public problem we were trying to solve.

Between a small team of volunteer civic techies, we had everything we needed to get started.

One of us had years of experience developing Natural Language Processing (NLP) and database solutions for community board data. Another was an active community board member and local civic engagement journalist who understood firsthand the frustrations and pain points of navigating these records.

We also had a full-stack engineer who could quickly build the user interface.

It helped that Block Party already had experience and infrastructure in place for semantic search extraction solutions.

When we first built Block Party in 2019 — before ChatGPT was a thing — our goal was to turn hours of public community board meetings into a digestible two-minute read. We built a tool that generated transcripts from closed-caption YouTube recordings and made summaries to send as a free weekly email. This enabled our curious subscribers to stay informed about the discussion highlights.

Over time, we experimented with semantic search across meeting transcripts so users could “vibe search” civic discussions.

The approach worked well for our demo, but meeting transcripts proved difficult source material. Community board meetings are long and may cover completely unrelated topics in a single session. The average meeting lasts more than two hours and has no standard structure. Because transcripts are generated through speech-to-text systems, even names and locations are not always reliable.

That earlier work helped us recognize that resolution data was a much stronger fit for semantic search.

Each resolution is relatively short, focused on a single issue, and follows a predictable structure built around TITLE, WHEREAS, and THEREFORE BE IT RESOLVED clauses. Because the text is written and published directly by humans, we could trust locations and entity names much more reliably than transcript data.

Those three traits — narrow focus, consistent structure, and trustworthy sourcing — made resolutions an unusually strong fit for semantic search.

Building the Resolution Archive

To build the archive, we:

Scraped and structured resolution text from public PDFs
Broke resolutions into searchable sections
Generated summaries to make procedural language easier to understand
Applied semantic search so users could search by topic rather than exact keywords
Linked every result back to the source documents

The biggest challenge was not the AI itself. It was the structure of the public records.

For our prototype with ~20 years of resolution PDFs for community board 3, the main challenge was that each meeting PDF could contain more than 10 separate resolutions, and formatting conventions changed constantly across the years. Some PDFs had different structures, vote tally conventions, and resolution formatting depending on the era.

Chat GPT Image Jun 5, 2026, 11 49 45 Am We needed to extract all the relevant data per resolution using the contextual structure of the TITLE clauses, WHEREAS clauses, and THEREFORE BE IT RESOLVED sections. Over time, we also noticed that the number and structure of those sections changed across different PDF eras.

When we first built the parser, we relied heavily on the LLM to split the text into resolutions and clauses.

At first glance, it looked like it worked.

But once we started reviewing the outputs closely and tinkering with the summarization results, something seemed off. We performed some exploratory analysis of the extracted data and realized that the parser was splitting the text into too many incomplete chunks and dropping important sections of the resolutions.

That turned out to be a useful reminder: always read through the source data yourself. Reading the original PDF and then again as raw extracted text helps build intuition about what the system is seeing.

With about two and a half weeks left before our Open Data Week presentation, we quickly iterated on the approach. Leaning on our old-school regex roots, we shifted to a two-stage parsing process that first identified complete resolutions based on TITLE clauses and then broke them into WHEREAS and THEREFORE BE IT RESOLVED sections.

Extracting the full-resolution text first, then chunking it, worked much better.

As we scaled up the resolution engine, we also added validation steps to the extraction process. We asked the LLM to flag suspicious parsing results based on human-defined thresholds, such as missing resolution sections, sparse extraction density, or signs of garbled OCR text.

Human validation proved essential throughout the process. We only caught many of these issues because people manually reviewed outputs and refined the system iteratively.

One important design decision was determining what users should receive in response to a search query.

When someone searches “Canal Street redesign,” should the system return an isolated sentence, a clause, or an entire resolution?

We decided the full resolution made the most sense because it preserved the broader context and outcome of the discussion.

That decision also shaped how we approached summarization. Rather than summarizing arbitrary text fragments, we used the structure of the resolutions themselves to guide the summaries. We separately processed TITLE, WHEREAS, and THEREFORE BE IT RESOLVED sections so the summaries could preserve both the context behind the issue and the action the board took.

Community board resolutions are not just collections of keywords. They are procedural documents with internal logic, political context, and formal outcomes

That nuance mattered. Community board resolutions are not just collections of keywords. They are procedural documents with internal logic, political context, and formal outcomes. Preserving that structure made the semantic search substantially more useful.

Key Takeaways

Start with domain expertise. The people closest to the problem are usually best positioned to judge whether an AI system is useful.
Build iteratively and gut check the data constantly. Understanding how your public data is generated and structured matters as much as the model itself.
Share your work publicly. This project exists because people in the civic technology community shared ideas, collaborated, and connected across disciplines.
Keep the process human. AI helped us move faster, but the real value came from understanding the civic problem we were trying to solve and building around the needs of residents, journalists, and community board members.

Local government produces enormous amounts of public knowledge. Too often, that knowledge is technically public while remaining practically inaccessible.

The goal must be to make it easier for residents, journalists, and public officials to understand how local decisions get made.

The goal must be to make it easier for residents, journalists, and public officials to understand how local decisions get made. We think tools like Block Party can help sustain civic participation by making institutional memory easier to search, understand, and use.

Turning 20 Years of Community Board Data Into Searchable Public Knowledge

Introducing Block Party’s Resolution Archive

When Building with AI, Harness Your Human Experience

Building the Resolution Archive

Key Takeaways

Tags

Research Radar: StatGPT and the Fourth Wave of Open Data

Solving Public Problems with Artificial Intelligence

Join People Powered on September 16 for the Release of New Guidance on AI for Digital Democracy