Really Awesome Gadget: The Democracy Rebooted Chatbot and Retrieval Augmented Generation

Beth Simone Noveck: Why don't we get started by having you both introduce yourselves?

I'm Róbert Bjarnason from the Citizens Foundation in Iceland.

Hello, I'm Stephan Schmidt, working for the Burnes Center and the GovLab and I've been in very close interaction with Robert as documented on the Discord channel!

BSN: Fantastic. I'm glad that this project has brought the two of you together. So why don't we just start with a very simple introduction to what the project is that you've been working on?

Robert: This was a really fun challenge. Think in terms of creating a chatbot that will have hundreds of different information sources all covering sort of similar topics. It's quite challenging, but in a good way. We are building technology to be able to handle the complex use case where you have a lot of data and, and you need to be able to have a conversation with the data that makes sense.

BSN: So before we dive, perhaps you can just offer a very basic explanation of a chatbot.

Róbert: Well, the idea is that if you have a data in various sources, different types of information, you know, like blog posts and PDFs, case studies and so on, data that is obviously online already, and you can go to all those different sources and you can look at the documents and you can even search the documents individually. A chatbot gives you the opportunity to effectively search through such a big data set, and also to make sense of the data in a way that's accessible to regular people.

BSN: This kind of “conversational agent" as chatbots are sometimes called, allows me to ask questions of a set of documents and get a response that synthesizes the learnings from those documents. This is a really exciting advance over searching and having to read one document at a time. I just want to understand: when I ask a question of a chatbot like this, am I only getting answers based on the documents and information you've given it?

Róbert: ChatGPT is trained on a lot of content but it will not have all the case studies, blog posts, and so on, in detail for every project on the planet.

This approach allows you to bring your own data, your own documents, and give that sort of the context. So you can actually instruct this chatbot to only answer questions from your documents. It is pulling from your documents but also reasoning about them based on the underlying large language model.

Stephan: What most users are familiar with is, you go online and you search for something. And when you go, let's say you go to one of the major search engines and type in your request, you get a list of results. Or a user goes to a specific website: let's say the Burnes Center or the GovLab. The added value of a chatbot is that it works with your raw universe of data but combines it with the power of ChatGPT to offer something much more specific for a given field. It's kind of like a search engine with the added value of AI technologies.

BSN: So help me to understand why you're spending so much time talking on Discord with one another, as opposed to using a readily available tool like Google's NotebookLM and just uploading all of those PDFs. What does it mean to do Retrieval Augmented Generation?

Stephan: ChatGPT tries to catch the whole world. But the data that funnels into it is opaque to us. By contrast, the data in the chatbot is much more transparent. Also - ChatGPT by itself pulls in too much data, so it takes a long time to reply.

Róbert: I absolutely agree with that. There's so many different types of information sources, and I think that sort of requires more of a custom solution than what is on offer with the OpenAI assistant.

Róbert and Stephan: We are building an ingestion pipeline, comprising 30 different ChatGPT prompts that are helping with the organization of information.

When we upload data (PDF documents, websites, pure text) we are compressing the data and translating text into numerical vectors.

When someone asks a question, we are applying 30 different prompts to the data automatically.

A completeness agent
A correctness agent
An hallucination agent

These AI agents rapidly run comparisons in parallel to evaluate what is the most relevant response. We are automating the process of ranking the content (one paragraph against another) to find the best response. We then output the content with a link back to the original source.

This is 50-70% faster than other approaches because we are compressing the content but then creating quality by running these rapid series of queries.

Normally, if you are just using ChatGPT as is it makes mistakes. There's additional text or some text is missing.

So we have this validation loop actually making sure that the content that has been produced in the cleaning is correct. We split large documents into small chunks. We need them to be in small chunks because when we are asking the chatbot about something we want it to retrieve the right part of the document.

Writing those sort of chatbots has proven to be much harder than people thought. So the main problem has been what people call hallucinations. You are talking to your data and suddenly there’s something completely new!

Imagine you have a blog post with three paragraphs. When you search your chatbot it finds the text it is looking for in the last paragraph of your three paragraph post. So it brings that into the chat engine and says, “Oh, here's the answer to your question.” And it brings in the end of your blog post but, in that case, ChatGPT will fill in the gaps.

You want to focus on the last paragraph but you don’t want that to lead to hallucinations.

BSN: I'm really keen to know, what can you do with this chatbot now and how is it going to evolve over time?

Róbert: Chatbots are an excellent way to help people find relevant information quickly. Our sort of philosophy at the Citizens Foundation has been to look at ways to use technology to help governments and citizens make better decisions together.

This generation of generative AI has huge potential when it comes to assisting us to make better decisions because we live in a world overflowing with information. There’s a huge opportunity for AI to help democracy by helping us get around information overload.

BSN: In addition, tell me what kinds of questions can I ask the Reboot Democracy Chatbot?

Stephan: You can ask this chatbot about the specific projects it has been fed such as collective intelligence projects, participatory democracy case studies and more on tech and democracy. You can ask about content from the Reboot Democracy blog on AI and governance. So you could ask about participating in your local government and it refers back to the source.

Robert: When the chatbot provides the answer, it provides a link back to the original information source and a little pop up with a short snippet of the paragraph of the actual text. You can easily go into the original document.

BSN: Very, very exciting. Thank you so much for all your hard work on this. I'm really looking forward to seeing how it evolves as we load it with more information.

Tags

Join People Powered on September 16 for the Release of New Guidance on AI for Digital Democracy

DOGE Is Using AI To Centralize Government Power. It’s Time to Flip the Script.

People Before Platforms: Why OMB’s AI Memos Won’t Work Without Training