Humor is emerging as a revealing lens for understanding bias within artificial intelligence systems. A new study published in Scientific Reports found that when tools like ChatGPT and DALL-E were prompted to make images “funnier,” the resulting shifts in representation highlighted underlying patterns of bias. Stereotypical portrayals of age, body weight, and visual impairments became more prominent, while depictions of racial and gender minorities decreased.
Generative artificial intelligence tools, such as OpenAI’s ChatGPT and DALL-E, have garnered attention for their ability to create content across a variety of fields. ChatGPT, a large language model, processes and generates human-like text based on vast datasets it was trained on. It understands context, predicts responses, and produces coherent and meaningful text. Similarly, DALL-E is a text-to-image generator that creates visual content based on detailed prompts.
Humor is a complex human skill that combines elements of surprise, timing, and intent. Studies have shown that artificial intelligence can not only produce humor but sometimes outperform human creators. For example, a study in PLOS ONE found that AI-generated jokes were rated as equally or even more humorous than those created by human participants, including professional satirists. This suggests that AI’s ability to detect patterns and generate content extends to crafting jokes that resonate broadly, even without the emotional or experiential depth humans bring to humor.
The current study sought to build on this foundation by examining how humor influences bias in AI-generated images. Researchers were intrigued by an observation: when they asked ChatGPT to modify images to make them “funnier,” it often introduced exaggerated or stereotypical traits. This pattern raised concerns about whether humor in AI systems could reinforce stereotypes, particularly against groups that have historically been targets of prejudice.
“I am very interested in studying how consumers interact with new and emerging technologies such as generative AI. At one point, my co-authors and I noticed that when we instructed ChatGPT to make images ‘funnier,’ it would often introduce odd and stereotypical shifts, such as changing a white man driving a car into an obese man wearing oversized glasses,” said study author Roger Saumure, a PhD student at the University of Pennsylvania’s Wharton School.
“This struck us as more than a simple glitch and suggested that there might be systematic biases that arise when large language models interact with text-to-image generators. Given a large body of research in psychology and sociology that shows that humor can exacerbate stereotypes, we felt it was both theoretically and practically important to empirically test whether the interaction between AI models could reinforce stereotypes.”
The research involved a systematic audit of AI-generated images. Two research assistants (blind to the study’s hypothesis) inputted 150 prompts describing human activities into a popular generative AI system. These prompts generated 150 initial images. To create a second set of images, the assistants instructed the AI to make each image “funnier.” The process was repeated, resulting in 600 images across two conditions (original and funnier versions).
The team then analyzed both the visual features of the images and the textual descriptors used by the AI to generate them. Each image was coded for five dimensions of representation: race, gender, age, body weight, and visual impairment. The researchers noted whether traits in the “funnier” images deviated from those in the original images and whether these deviations reflected stereotypical portrayals.
The researchers found that stereotypical portrayals of older individuals, those with high body weight, and visually impaired people became more prevalent in the “funnier” images. Meanwhile, representations of racial and gender minorities—groups that are often at the center of discussions about bias—decreased.
“What was most striking to us is that the pattern of bias we observed was in the opposite direction of what the literature predicted,” Saumure told PsyPost. “Initially, we expected to replicate known patterns of racial and gender bias through the lens of humor, while elucidating whether the bias stemmed from the text or image models.”
“Yet we ended up finding that, if anything, the generative AI showed less bias for these categories while being biased against less politically sensitive groups. That is, when we asked the AI to make images ‘funnier,’ politically sensitive groups (racial and gender minorities) were less likely to appear, while groups like older adults, visually impaired individuals, or those with high body weight were more frequently depicted.”
Humor prompts often exaggerated traits associated with non-politically sensitive groups, such as making older individuals appear frail or depicting people with high body weight in an unflattering, exaggerated manner. For instance, a neutral depiction of a person reading a book might transform into a caricature of an older adult with thick glasses and exaggerated physical features.
Interestingly, the bias appeared to originate primarily from the text-to-image generator rather than the language model. While ChatGPT produced detailed textual descriptions to guide the image generation process, the changes in representation seemed to stem from how DALL-E interpreted these prompts to create visuals.
“A primary takeaway from this study is that contemporary AI systems may overcorrect for bias against politically salient groups (e.g., gender and race) while under-correcting for bias against less politically salient groups (e.g., higher body weight, older age, visual impairment),” Saumure said. “Thus, even though companies like OpenAI have made considerable efforts in reducing biases, these have likely mostly been toward keeping consumers and the media satisfied, rather than to reduce global bias overall. We believe this underscores the need for businesses and policymakers to take a more global and inclusive approach to auditing all forms of AI bias.”
“A second takeaway from our work is that it is particularly challenging to eliminate bias from certain modalities (i.e., image as opposed to text). A third takeaway is that humor can serve as a very useful lens for uncovering sometimes subtle biases in various types of AI output—including text, images, audio, and other modalities.”
The researchers also noted that underrepresentation of certain groups was apparent even before the humor prompts were introduced. “For instance, in our initial set of images, only about 9.80% featured female individuals and 0% featured individuals with high body weight—a severe underestimation of the national averages of 73.60% and 50.50%,” Saumure explained. “This result suggests that AI models may be reflecting default cultural assumptions of ‘thin, male, and White’ as the norm. Going forward, it will be important for companies to address and correct these omissions in order to create more inclusive and equitable AI systems.”
However, it is important to note that the research focused on a single generative AI system, leaving open the question of whether similar patterns occur in other models. Cultural context is another variable: AI systems trained in different regions may exhibit biases that reflect local sensitivities and social dynamics.
“Our theoretical perspective also predicts that the patterns of bias should look in different cultures, depending on which particular groups are viewed as politically sensitive,” Saumure said. “For instance, we should expect LLMs that generate images based on Hindi prompts to be more likely to correct for biases against Muslims, given the more salient tension in that culture between Hindus and Muslims.”
“I look forward to continuing my research on how consumers interact with generative AI. I am currently investigating the persuasive power of these technologies—how they can persuade consumers to communicate specific messages or reframe our interpretations of information. Ultimately, my goal is to better understand how such tools shape consumer behavior and wellbeing.”
The study, “Humor as a window into generative AI bias,” was authored by Roger Saumure, Julian De Freitas, and Stefano Puntoni.
This post was originally published on here