Will AI Agents Replace Human Subjects in Social Science?

In a groundbreaking study that reads like a blueprint for the future of social science, researchers have unveiled an innovative use of artificial intelligence to simulate the behavior, attitudes, and decisions of over 1,000 people.

Drawing from detailed qualitative interviews, these “generative agents” replicated participants’ responses with uncanny accuracy, begging the question about whether AI might one day render human recruitment for psychological and social science research obsolete.

By embedding the findings of extensive interviews into a large language model, the researchers created AI-driven agents capable of emulating the responses of real people to various surveys and experiments.

These agents mirrored the responses of the 1,052 human participants with an accuracy of 85%.

An example is their answers to the General Social Survey (GSS), a widely used sociological survey that assesses attitudes and beliefs.

And 85% actually exceeds the consistency of humans when retaking the same tests several weeks apart.

The study, conducted by a consortium of scholars from Stanford, Northwestern, Google DeepMind, and the University of Washington, introduces a new paradigm for human behavioral simulation.

The study was published on Cornell University’s arXiv.org, an open-access repository for academic research, on November 15, 2024.

ArXiv is widely used by researchers in fields like computer science, physics, mathematics, and social sciences to share preprints of work that has not yet undergone peer review.

The Promise of Simulated Research

For decades, social scientists have relied on labor-intensive methods to recruit diverse participants, administer surveys, and run experiments.

While these traditional approaches provide valuable insights, they also come with significant costs and logistical hurdles.

By offering a scalable, ethical alternative, generative agents could become a powerful tool for exploring human behavior on a massive scale.

Imagine testing public health messages, economic policies, marketing campaigns, or educational programs across thousands of simulated people representing diverse demographic groups, all without scheduling a single in-person session or paying hefty participation fees.

The authors describe this as creating “a laboratory for researchers to test a broad set of interventions and theories.”

Revolutionizing Psychology and Personality Research

Psychology and personality research have long relied on painstaking methods to gather data from human participants.

Studies often involve surveys, interviews, or experiments conducted in laboratory or virtual settings, requiring significant investments of time, labor, and money.

These tasks typically involve many researchers and assistants.

Participants, in turn, must dedicate their time, often over multiple sessions, leading to logistical challenges and high costs for compensation.

Participants also often receive compensation, and some studies additional payments for follow-up sessions or incentives for economic games.

Including wages for researchers, software costs, and institutional overhead, a study of this scale could easily run into six figures and take months — or even years — to complete.

Using AI-driven generative agents offers a transformative alternative.

Instead of recruiting and surveying people, researchers could program these agents with data from previous interviews or personality assessments.

These agents, trained on data from tools like the Big Five Inventory or General Social Survey, can simulate responses to hypothetical scenarios, yielding insights that closely mirror human behaviors.

By eliminating the need to recruit participants or administer surveys, a study replicating the scope of traditional research could be completed in days instead of months.

Researchers could simulate additional scenarios or experiments without incurring significant extra effort or expense.

Beyond the immediate savings, this approach opens new doors for smaller research teams or institutions with limited funding, enabling them to conduct large-scale studies that were previously out of reach.

By reshaping the logistical and economic landscape of social and psychological research, generative agents could significantly accelerate the pace of discovery in understanding human behavior and personality.

Methodology

The study recruited a sample of 1,052 U.S. participants via Bovitz, a study recruitment firm.

They were diverse in terms of age, region, education, ethnicity, gender, income, political ideology, and sexual identity to represent the broader population.

Ages ranged from 18 to 84, with a mean of 48 years.

The participants completed two-hour voice interviews conducted by an AI interviewer, which asked follow-up questions based on participants’ responses, ensuring depth and richness of data.

The interviews explored personal history, values, and opinions on societal topics, capturing an average of 6,491 words per participant.

These transcripts formed the knowledge base for creating the generative agents.

To evaluate the AI agents, the human participants completed the General Social Survey (GSS), the Big Five Personality Inventory (BFI-44), behavioral economic games like the Dictator Game and Trust Game.

Two weeks after the initial interview, participants retook the surveys and experiments to provide a benchmark for their internal consistency.

The transcripts of the participant interviews were embedded into a large language model to create the AI agents, and these agents were evaluated by comparing their responses to human responses.

But 85% is far from perfect, right?

While this may not seem perfect, it is remarkably close to human-like accuracy, particularly since human respondents themselves are inconsistent over time.

The researchers measured this inconsistency by asking participants to retake surveys and experiments two weeks after their initial responses.

The human participants’ own replication accuracy — the rate at which they gave the same answers on both occasions — was 81% on average.

This indicates that human responses naturally vary, influenced by factors like memory, mood, and context.

In other words, 85% accuracy effectively nears the ceiling of what could be expected from humans.

As the researchers note, the generative agents “predict participants’ behavior and attitudes well, especially when compared to participants’ own rate of internal consistency.”

So while improvements are still possible, the agents are already effective enough to simulate behaviors and attitudes with a level of fidelity that mirrors real-world variability in human responses.

A New Era of Accelerated Social Science Discovery

This new methodology allows for rapid hypothesis testing, simultaneous exploration of multiple research questions, and instant availability of results, drastically shortening the time from concept to insight.

Meta-analyses, traditionally time-intensive, could become standard practice, allowing researchers to validate findings across large datasets quickly and systematically.

By testing complex interactions, exploring diverse scenarios, and developing personalized theories, the field could address long-standing challenges like the reproducibility crisis while advancing ethical and policy interventions at scale.

“Human behavioral simulation—general-purpose computational agents that replicate human behavior across domains—could enable broad applications in policymaking and social science,” the researchers write.

Study details:

Title: “Generative Agent Simulations of 1,000 People”
Date Submitted: 15 Nov 2024
Authors: Joon Sung Park (Stanford University), Carolyn Q. Zou (Stanford University/Northwestern University), Aaron Shaw (Northwestern University), Benjamin Mako Hill (University of Washington), Carrie Cai (Google DeepMind), Meredith Ringel Morris (Google DeepMind), Robb Willer (Stanford University), Percy Liang (Stanford University), Michael S. Bernstein (Stanford University)
Link: https://arxiv.org/abs/2411.10109

This post was originally published on here