Silicon Sampling: When Communications Practitioners Should (and Shouldn’t) use AI in the Survey Pipeline

Research labs in both industry and academia have been conducting experiments that scope the potential of generative AI technologies to predict the responses of human communities to various stimuli and questions. This is a new type of research, one very much in its infancy. Dozens of papers are coming out that explore this new domain.

Our own efforts through our new AI-Media Strategies Lab (AIMES Lab) at Northeastern University have tried to track this new line of inquiry, particularly for the purpose of informing practitioners about frontier research and helping to provide guidance. AIMES Lab focuses on the use of AI technologies in media industries, providing evidence-based recommendations to organizations.

Large Language Models (LLMS), such as OpenAI’s ChatGPT or Anthropic’s Claude, have quickly worked their way into the workstreams of communications practitioners across the world. Since LLMs are intentionally designed to be helpful assistants, they will generally assist with any communications task asked of them, doing everything from drafting headlines to serving as human substitutes for survey research.

Of course, just because LLMs can assist with most tasks does not mean they should assist with every task. With that being said, it can be difficult to discern which tasks are appropriate for LLM assistance and which aren’t, especially with the artificial intelligence (AI) landscape rapidly evolving.

In our newly released paper, “AI Simulations of Audience Attitudes and Policy Preferences: ‘Silicon Sampling’ Guidance for Communications Practitioners", we try to provide guidance for communications practitioners in at least one common workflow: survey research. We argue that LLMs are best used as complements for early-stage survey design tasks and are ill-suited as substitutes for humans in the actual survey-taking process.

Survey Designers

LLMs may provide the biggest benefit in the early stage of the survey pipeline as proof-readers, question editors, or information summarizers. Research shows that certain models can identify key components of survey questions and provide constructive feedback on their phrasing, helping practitioners catch errors or tighten up ambiguous wording.

To get the most value out of LLMs at this stage, practitioners should be mindful of their prompt phrasing. Potentially due to their alignment, part of their training where LLMs are taught to be helpful and harmless assistants, LLMs tend to be sycophantic, trying to please users even at the expense of providing honest or accurate responses.

To combat this, we recommend phrasing questions objectively and explicitly prompting for feedback. Practitioners should try prompts like “Why might this question be difficult for survey takers to understand?” or “What are three ways to improve the phrasing of this question?”

LLMs can also be valuable when adapting surveys across cultures, particularly for highlighting phrases that may be interpreted differently across cultural contexts. Again, we recommend objective prompts for this task, such as “How might non-native English speakers interpret this question?”

Survey Takers

LLMs may also be helpful as early-stage pilot-testers, especially for resource-constrained practitioners. By asking LLMs to take on different personas with varying demographic attributes and answer early-stage survey questions, practitioners can generate an initial dataset to assess hypotheses before running full-scale human surveys.

This is best used as a prioritization tool to understand which questions lead to either high answer variance or responses that go against practitioners' priors and are therefore the most pertinent to ask.

Practitioners should be careful when interpreting this data and validate it against an external source - either existing human survey data or expert opinion. This human substitute pilot-testing should only be used during early-stage exploration and only to get initial signals.

We strongly caution against having LLMs actually take surveys in place of humans because, after reviewing over 30 academic papers on silicon sampling, the evidence is clear: LLMs are unreliable human substitutes.

LLMs are unreliable because they consistently fail to capture the full distribution of human responses, frequently presenting a narrow range of opinions - especially for divisive topics like race or religion. This opinion collapse, like aforementioned LLM sycophancy, may be partially due to model alignment.

Paradoxically, on current political issues (gun rights, immigration, abortion), LLMs tend to overemphasize ideological differences, presenting groups as more polarized than they actually are. The opinions they represent are also likely skewed, and the direction they are skewed in seems to differ by topic.

In addition to LLM ‘public’ opinion on aforementioned political topics being skewed to the extremes, LLM ‘public’ opinion on certain scientific topics, like climate change, is closer to scientists than the general public, making them ill-suited for capturing true public opinion on the topic.

LLMs also stereotype certain demographic groups, either by exaggerating group differences or failing to capture within-group variation, and should likely not be used as sources of truth, even in pilot-testing phases for the following groups: Independents, non-Hispanic Black Americans, conservatives, nonbinary individuals, and people of Middle-Eastern or Hispanic background.

Survey Interpreters

LLMs may be helpful for analyzing already collected human survey data, either directly by coding or classifying open-ended survey responses or indirectly by generating software code for data analysis of survey results. Practitioners should be cautious about privacy concerns during this step.

Sharing human survey data, especially data that contains personally identifiable information, with interfaces like ChatGPT raises serious concerns. If practitioners really want to use LLMs to interpret survey data, they should consider deploying open-source models on their own servers, which requires significant additional resources.

Our Recommendation: Take a Hybrid Approach

Given the current state of LLMs, we recommend a hybrid approach when incorporating them into the survey pipeline. LLMs are at their best when they are used as collaborative tools for refining questions, gathering early-stage signals, and indirectly assisting in post-hoc data analysis. When it comes to understanding what people actually think, there’s no substitute for asking real people.

As practitioners continue to incorporate this new technology into their workflows, they should be mindful of design decisions, such as model alignment, that will impact downstream model outputs and known limitations in LLMs or other AI tech. Practitioners should be transparent about their LLM use and always rely on expert validation and common sense to sanity check LLM outputs.

By treating LLMs as powerful but imperfect assistants that enhance rather than replace human insight, communications professionals can leverage AI's benefits while maintaining the integrity of their work.

Survey Designers

Survey Takers

Survey Interpreters

Our Recommendation: Take a Hybrid Approach

Tags

Solving Public Problems with Artificial Intelligence

Research Radar: Co-Designing AI Systems

Research Radar: Race, Democracy, and AI - Spencer Overton Offers a Framework for a More Inclusive Digital Future