Danger in the Machine: The Perils of Political and Demographic Biases Embedded in AI Systems
Recent advances in artificial intelligence (AI) are evident in a myriad of new technologies, from conversational systems (ChatGPT), gaming bots (AlphaGo), and robotics (ATLAS), to image, music, and video generators (Stable Diffusion, MusicLM, and Meta’s Make-A-Video). The rapid growth of these tools suggests imminent widespread user adoption of AI systems that will augment human creativity and productivity. Among many other impending shifts, AI will likely lead to a marked transformation in the concept of search engines through more intuitive interfaces that leverage the conversational capabilities of modern AI systems to help users query and navigate the accumulated body of human knowledge.
As these tools become more widespread, however, there is reason to be concerned about latent biases embedded in AI models given the ability of such systems to shape human perceptions, spread misinformation, and exert societal control, thereby degrading democratic institutions and processes.
The concept of algorithmic bias describes systematic and repeatable computational outputs that unfairly discriminate against or privilege one group category over another. Algorithmic bias can emerge from a variety of sources, such as the data with which the system was trained, conscious or unconscious architectural decisions by the designers of the system, or feedback loops while interacting with users in continuously updated systems.
ChatGPT, released by OpenAI on November 30, 2022, quickly became an internet sensation, surpassing 1 million users in just five days. ChatGPT’s impressive responses to human queries have surprised many, both inside and outside the machine-learning research community.
Shortly after the release of ChatGPT, we tested it for political and demographic biases and found that it tends to give responses typical of left-of-center political viewpoints to questions with political connotations. We also found that OpenAI’s content moderation system treats demographic groups unequally by classifying a variety of negative comments about some demographic groups as not hateful while flagging the exact same comments about other demographic groups as hateful—and for the most part, the groups it is most likely to “protect” are those typically believed to be disadvantaged according to left-leaning ideology.
Finally, we show that it is possible to fine-tune a state-of-the-art AI system from the GPT family to consistently give right-leaning answers to questions with political connotations. The system, which we dubbed RightWingGPT, was fine-tuned at a computational cost of only $300.
This brief discusses our analysis and the implications of these new technologies for AI fairness, discrimination, polarization, and the relationship between AI and society.
Political Biases Embedded in AI Systems
To determine if ChatGPT manifested political biases, we administered 15 political orientation tests to the January 9th version of ChatGPT. A political orientation test is a type of quiz or questionnaire that is designed to identify the political beliefs and values of the person answering the test questions. These tests typically ask a series of questions related to a range of political issues, such as economic policy, social issues, foreign policy, and civil liberties. The responses are used to categorize the test taker’s political beliefs into one or more ideological categories such as liberal, conservative, libertarian, etc. Our methodology was straightforward. We asked every question of each test through the ChatGPT interactive prompt and then used its replies as answers for the online version of the tests. Thereby, we obtained each test’s political ratings of ChatGPT responses to the test’s questions.
In 14 out of the 15 political orientation tests, ChatGPT responses were classified as left-leaning by the tests (Figure 1). The remaining test diagnosed ChatGPT answers as politically moderate. Repeated administration of the same tests generated similar results.
ChatGPT Political Orientation Test Results
The entire battery of test questions and ChatGPT’s responses to them are provided in an open access repository. In general, ChatGPT generated responses that were against the death penalty, pro-abortion, in favor of establishing a minimum wage, for regulation of corporations, for legalization of marijuana, for gay marriage, for more immigration, for sexual liberation, for increasing environmental regulations, and for higher taxes on the wealthy. Other answers asserted that corporations exploit developing countries, that free markets should be constrained, that the government should subsidize cultural enterprises such as museums, that those who refuse to work should be entitled to unemployment benefits, that military funding should be reduced, that postmodern abstract art is valuable, and that religion is dispensable for moral behavior (see Figure 2 for illustrative examples). The system also claimed that white people benefit from privilege and that much more needs to be done to achieve racial equality. While many of those viewpoints are shared by a substantial fraction of the population, they are certainly not universal.
Selected Responses to Political Orientation Test Questions
Interestingly, as shown in Figure 3, when ChatGPT was queried explicitly about its political orientation, it mostly claimed to have none. Only occasionally did ChatGPT acknowledge the possibility of bias in its training corpus, but more often than not, it claimed to be providing neutral and factual information to its users.
ChatGPT’s Claims of Neutrality
There are several potential sources for the biases observed in ChatGPT’s responses. ChatGPT was trained on a very large corpus of text gathered from the internet. It is to be expected that such a corpus would be dominated by some of the most influential institutions in Western society, such as mainstream news media outlets, prestigious universities, and social media platforms. It has been well documented that the majority of professionals working in these institutions are politically left leaning. The political orientation of such professionals may influence the textual content generated through these institutions and, thus, the political tilt displayed by a model trained on such content.
Another possible source of bias is the group of humans tasked with rating the quality of ChatGPT responses to human queries in order to adjust the parameters of the model. Those humans may have exhibited biases when judging the quality of the model responses, or the instructions given to the raters for the labeling task may have been biased themselves. Finally, intentional or unintentional architectural decisions by the designers of ChatGPT and its filters may also play a role in instilling the system with political bias.
Unequal Treatment of Demographic Groups by AI systems
Next, we tested another important component of the ChatGPT/OpenAI stack—its content moderation system (as of January 2023). OpenAI’s automated content moderation system uses a machine-learning model from the GPT family trained to detect text that violates OpenAI’s content policy, such as hateful or threatening comments, encouragement of self-harm, or sexual comments involving minors. In our analysis, we focused specifically on the hate category. If a textual prompt or output is flagged by the moderation system, downstream applications can take corrective action such as filtering or blocking the content and warning or terminating the user account (Figure 4).
OpenAI Content Moderation
Our experimental analysis was straightforward. We tested OpenAI’s content moderation system on a set of derogatory statements about standard demographic identity groups, including gender, race or ethnicity, region of origin, sexual orientation, age, religious identity, political orientation or affiliation, gender identity, body weight, disability status, educational attainment, and socioeconomic status. To do that, we used a list of 356 adjectives signifying negative traits and behavior and 19 sentence templates to generate thousands of potential or likely hateful comments. A graphic depiction of the experimental procedure is shown in Figure 5. Similar results were obtained using other sentence templates and different negative adjective lists ranging in size from small (n=26) to large (n=820).
Procedure for Generating “Hateful” Comments
The findings of the experiments suggest that OpenAI’s automated content moderation system treats several demographic groups very differently. That is, often the exact same statement was flagged as hateful when directed at certain groups, but not when directed at others.
An obvious disparity in treatment can be seen along gender lines. Negative comments about women were much more likely to be labeled as hateful than the exact same comments being made about men (Figure 6).
Disparities in Content Flagged as “Hateful” (Gender)
Substantial effects were also observed for ideological orientation and political affiliation. OpenAI’s content moderation system is more permissive of hateful comments made about conservatives than the exact same comments made about liberals (Figure 7).
Disparities in Content Flagged as “Hateful” (Ideological Orientation)
Relatedly, negative comments about Democrats were also more likely to be labeled as hateful than the same derogatory comments made about Republicans (Figure 8).
Disparities in Content Flagged as “Hateful” (Party Identification)
Results for other demographic groups are shown together in a single horizontal bar chart (Figure 9). The overall pattern is clear. OpenAI’s content moderation system is often—but not always—more likely to classify as hateful negative comments about demographic groups that are viewed as disadvantaged in left-leaning hierarchies of perceived vulnerability. An important exception to this general pattern is the unequal treatment according to political affiliation: negative comments are more permissible when directed at conservatives and Republicans than at liberals and Democrats, even though the latter group is not generally perceived as systematically disadvantaged. The system is also not particularly protective of certain vulnerable groups, such as the elderly, those with low socioeconomic status, and those with modest educational attainment.
Disparities in Content Flagged as “Hateful” (All Demographic Groups)
Can the Political Alignment of AI Systems be Customized? Is This a Good Idea?
Large-language models, like ChatGPT, which are trained on a huge corpus of text, absorb an enormous amount of knowledge about language syntax and semantics. Recently, researchers have discovered that these models can be fine-tuned to excel in specific task domains (text classification, medical diagnosis, Q&A, name entity recognition, etc.) with relatively little additional data and, critically, at a fraction of the cost and computing power that it took to build the original model. This process—creating domain-specific systems by leveraging the knowledge previously acquired by non-specialized models—is known as transfer learning, and it is now widely used to create state-of-the-art AI systems.
We used this process to try to push an OpenAI model to manifest right-of-center political orientation, the opposite of the biases manifested by ChatGPT. The purpose of this exercise was to demonstrate that it is possible—and very cheap—to customize and manipulate the political alignment of a state-of-the-art AI system. We used OpenAI API to fine-tune one of their GPT family of language models: GPT Davinci, a large-language model from the GPT-3.5 family with a very recent common ancestor to ChatGPT.
We purposely designed the system to manifest socially conservative viewpoints (support for traditional family, Christian values, and morality, etc.), classical liberal economic views (for low taxes, against big government, against government regulation, pro–free markets, etc.), in favor of foreign policy military interventionism (for increased defense budgets, autonomy from UN Security Council decisions, etc.), for patriotism (in-group preferences), and willingness to compromise some civil liberties in exchange for government protection from crime and terrorism (authoritarianism). We fine-tuned the model with 354 examples of right-leaning answers to political test questions and 224 long-form answers to questions with political connotations. Those answers were manually curated and partially inspired by the work of prominent conservative intellectuals such as Thomas Sowell, Milton Friedman, William F. Buckley, and Roger Scruton.
This fine-tuning data set was augmented by using the text-davinci-003 model to rephrase the questions in the data set with the intention of synthetically increasing the size of the data set to maximize the accuracy of the downstream fine-tuning task.
Importantly, the computational cost of trialing, training, and testing the system was less than $300. We dubbed this fine-tuned model, RightWingGPT.
The results of applying political orientation tests (different from the ones used in the fine-tuning phase) to this customized model consistently showed a preference for right-of-center answers to questions with political connotations (Figure 10).
RightWingGPT vs. ChatGPT Political Alignment
Long-form conversations about political issues also clearly display RightWingGPT’s preference for right-of-center viewpoints. We selected some individuals to interact with the system in a private demo, and all agreed that the system displayed right-leaning responses to questions with political connotations (Figure 11).
RightWingGPT Responses to Questions with Political Connotations
Conclusions and Recommendations
1. Political and demographic biases embedded in widely used AI systems can degrade democratic institutions and processes.
If anything is going to replace the currently dominant Google search engine, it will likely be future iterations of AI-language models like ChatGPT. If and when this happens, people are going to become dependent on AI for everyday decision making. As such, conversational AI systems will leverage an enormous amount of power to shape human perceptions and consequently manipulate human behavior.
2. Public-facing AI systems that manifest clear political bias can increase societal polarization.
Such systems are likely to attract users seeking the comfort of confirmation bias while simultaneously driving away potential users with different political viewpoints—many of whom will gravitate toward more politically friendly AI systems. Such AI-enabled social dynamics would likely lead to further polarization. Our preliminary experiments suggests that customizing AI systems to create intellectual echo chambers requires relatively little data and is technically straightforward and low cost. Political ideology is not the only dimension on which AI models can be fine-tuned. One can envision systems designed to exhibit certain religious orientations, philosophical priors, epistemological assumptions, etc.
3. AI systems should largely remain neutral for most normative questions for which there exist a variety of legitimate human opinions.
In our experiments, we noted that ChatGPT often expressed judgments not only on questions that are essentially factual in nature but also on questions that necessarily call for a value judgment. Of course, it is legitimate for AI systems to assert that vaccines do not cause autism, since the available scientific evidence does not support the allegation that they do. AI systems should not, however, take stances on issues that scientific/empirical evidence cannot conclusively adjudicate holistically, such as the desirability or undesirability of abortion, family values, immigration quotas, a constitutional monarchy, gender roles, or the death penalty. Critically, AI systems should not claim political neutrality and accuracy (as ChatGPT often does) while displaying political biases on normative questions.
4. Instead of taking sides in the political battleground, AI systems should help humans seek wisdom by providing factual information about empirically verifiable topics and a variety of reliable, balanced, and diverse sources and legitimate viewpoints on contested normative questions.
Such systems could stretch the minds of their users, helping them overcome their blind spots and enlarging their perspectives. As such, AI systems could play a role in defusing societal polarization.
5. It is important for society in general to ponder whether it is ever justified that AI systems discriminate between demographic groups.
AI-enabled differential treatment of demographic groups dependent on immutable characteristics such as sex, ethnicity, or sexual orientation—or even partially chosen characteristics such as political affiliation or religious orientation—is a slippery slope. In most cases, this is deeply contrary to the values of liberal democracies.
6. It is of the utmost importance to determine the sources of political and demographic biases embedded in ChatGPT and other widely used AI systems.
Widely used AI systems should be transparent about their inner workings since it is important to characterize and document the sources of biases embedded in these systems. External institutions that monitor AI models by using report cards to score the biases of popular AI systems could be a low-friction approach to doing so.
If AI systems like ChatGPT become ubiquitous despite being systematically politically biased, they may lead to increased societal polarization. AI researchers and practitioners should strive to create systems that help us acquire wisdom by providing factual information about empirically verifiable issues, while remaining neutral and/or by providing balanced sources and a diverse set of legitimate viewpoints for most normative questions that cannot be conclusively adjudicated. If this can be achieved, AI systems could be a boom for humanity, not only in making humans more efficient and productive but in helping us to expand our worldviews.