Research Radar: AI Speeds Up Government Consultation Analysis Without Sacrificing Quality
New Article: Consult Evaluation: Scottish Government's Non-surgical cosmetic procedures consultation
Question Asked: Can AI effectively automate the analysis of public consultation responses while maintaining the quality and accuracy needed for policy decision-making?
Authors: Makinson, Lowe, Moore, Pryse-Davies, Banton, French, Ryall, Menezes, Robinson, Webb, Punatar
Source: Incubator for AI Consult Evaluation Report
Question Asked: Can AI effectively automate the analysis of public consultation responses while maintaining the quality and accuracy needed for policy decision-making?
The UK's Incubator for Artificial Intelligence evaluated "Consult," an AI-powered tool designed to identify themes and classify responses in government consultations, using a live Scottish Government consultation on regulating non-surgical cosmetic procedures.
Government consultation analysis is enormously resource-intensive - the UK runs around 600 public consultations annually, some receiving over 100,000 responses, requiring hundreds of thousands of hours of civil service time or expensive external contractors.
Significance: Government consultation analysis is enormously resource-intensive - the UK runs around 600 public consultations annually, some receiving over 100,000 responses, requiring hundreds of thousands of hours of civil service time or expensive external contractors. This creates significant delays in policy development while consuming substantial public resources. Consult represents a potential breakthrough in making democratic participation more efficient and responsive by dramatically reducing the time between public input and policy action.
Method: Consult uses a two-stage AI process with human oversight. First, it employs topic modeling to automatically identify common themes across all consultation responses (rather than just samples). Second, it uses Large Language Models to classify each response according to these themes. The evaluation compared AI performance against expert human reviewers, analyzed theme ranking differences, and conducted user research through observations, surveys, and interviews with policy professionals.
Tools and Open Source Availability: The evaluation relied on two core tools developed by the UK Government’s Incubator for AI (i.AI): ThemeFinder, a Python package that handles theme generation and response classification using AI, and Consult, the user-facing application that supports human review and final analysis. Both tools are open source, enabling transparency, reuse, and adaptation by other governments or civic tech developers. The report does not specify which Large Language Models (LLMs) are used, leaving questions about the underlying AI architecture and its generalizability.
Experiment: The evaluation used a live Scottish Government consultation with expert reviewers from two stages: theme generation sign-off (4 reviewers) and theme mapping verification (6 reviewers). Researchers measured alignment between AI and human theme assignments, calculated how differences affected overall theme rankings, tracked review times, and assessed user experience through the "human-in-the-loop" interface that allows reviewers to verify and correct AI classifications.
Findings: Consult achieved strong accuracy, correctly matching human expert judgments 76% of the time and requiring no changes for 60% of responses. Differences between AI and human mappings had minimal impact on theme rankings - the key driver of policy recommendations. Review time averaged just 23 seconds per response. However, Consult struggled to identify missing themes (reviewers added new themes 17 times more often than the AI) and the current theme approval process proved time-intensive and the team recommended trusting the AI themes more in future. Reviewers appreciated reduced bias but wanted continued human oversight. Key research gaps include testing across different policy topics, cost analysis, and guidance on when AI analysis is appropriate.
Key Gaps: Note that while the evaluation provides detailed performance metrics for the AI tool, it lacks contextual details about the consultation’s substance, timing, or demographics, which could affect how generalizable the results are.
Reflection for Democracy: This research demonstrates AI's potential to dramatically accelerate democratic responsiveness while maintaining analytical quality. By reducing consultation analysis time from months to weeks, Consult and processes like it could enable more frequent and comprehensive public engagement without overwhelming government resources. The tool's ability to process entire datasets rather than samples could surface minority viewpoints that might be missed in traditional analysis. However, the "human-in-the-loop" approach proves crucial - the technology augments rather than replaces human judgment, preserving democratic accountability while enhancing efficiency. The success suggests a future where governments can engage citizens more frequently and responsively, potentially strengthening democratic legitimacy through more accessible and timely policy development.