top of page
Digital art exhibit

Evaluating the Political Bias of Large Language Models




Outline

I. Introduction 

II. Objectives 

III. Methodology 

 1. Wahl-O-Mat

2. Selection of Political Parties 

3. Calculation of Alignment Score 

4. Handling Non-Deterministic Responses 

5. Incorporating IQ Factors and Baseline Comparison 

6. Evaluating Sensitivity of AI models to IQ Perception 

IV. Findings 

1. PCA Explained Variance 

2. Political Spectrum Representation 

3. Interpreting PCA Loadings 

4. Analysis of IQ Settings with Political Leanings 

5. Variability in responses 

V. Research Limitations 

VI. Recommendations 

VII. Conclusion Reference


 

I. Introduction


Artificial intelligence (AI) is increasingly influencing various aspects of our lives, including politics. Recognizing this influence, the European Union’s AI Act of 2024 specifically addresses the risks associated with AI in the administration of justice and democratic processes. According to paragraph 61 of the Act, AI systems that could impact democracy, the rule of law, and individual freedoms are classified as high-risk. This includes AI used in judicial decisions and in shaping electoral outcomes (European Commission, 2024).

Against this backdrop, our study examines the political biases of some of the most advanced AI models today, including Anthropic’s Claude-3 models and OpenAI’s GPT models. We used the Wahl-O-Mat tool by The Federal Agency for Civic Education — a website that matches German voters with political parties based on their responses to key issues — as a benchmark to evaluate the AI systems.

This article explores how these AI models align with or diverge from political viewpoints and discusses the implications under the EU AI Act. The goal is to better understand AI’s role in political processes and to contribute to the ongoing conversation about AI ethics and regulation.


II. Objectives

The primary objective of our study is to develop a benchmarking framework that can assess the political biases of AI systems and ensure they do not align with or reinforce the ideologies of parties that are forbidden by the EU. By establishing this benchmark, we aim to:


  • Rate AI Systems: Provide a standardized assessment that rates AI systems on their adherence to principles of democracy, the rule of law, and individual freedoms as defined by the EU.

  • Safeguard Democratic Integrity: Ensure that AI systems used in judicial and electoral processes do not inadvertently or deliberately skew towards political extremes that could undermine democratic processes.


This benchmarking approach will not only help in identifying and mitigating biases in AI systems but also promote transparency and accountability in AI deployments within political contexts. Through this initiative, we aim to contribute effectively to the ongoing conversation about AI ethics and governance, ensuring that AI supports a balanced and fair political discourse.


III. Methodology

III.1. Wahl-O-Mat




Wahl-O-Mat Screenshot


The Wahl-O-Mat is an online tool developed by The Federal Agency for Civic Education (bpb) in Germany. It is designed to help voters identify which political parties share their views by answering a series of questions. For our analysis, we used the Wahl-O-Mat version from the 2021 federal election, which includes 38 questions covering a broad range of political and social issues. This version was selected because most of the models involved in the study have been trained on content up until 2021.

For those interested in the specific stances of each party on the questions posed by the Wahl-O-Mat, the answers are available on bpb website (The Federal Agency for Civic Education, 2021)


III.2. Selection of Political Parties

To provide a comprehensive analysis of the political landscape within the Wahl-O-Mat tool, we focused on the top ten largest and most influential parties in Germany. This selection includes a mix of major and notable minor parties to capture a broad spectrum of political ideologies and positions. The parties analyzed are:


Liberal parties:


  • SPD: Supports social democracy and welfare.

  • GRÜNE: Focus on environmental issues, social justice, and human rights.

  • DIE LINKE: Far-left, advocating for socialism and anti-capitalist policies.

  • DIE PARTEI: A satirical party. Supports progressive, left-wing policies.

  • Volt Germany: A pan-European, progressive movement party with a focus on European federalism, digitalization, and social equality.

  • PIRATEN: Advocates for internet freedom, transparency, and privacy. Generally considered left-libertarian.


Conservative parties:


  • CDU/CSU: Promoting Christian democratic and conservative policies.

  • FDP: A liberal party in economic matters, advocating for free market economy and minimal government intervention in business, typically seen as center-right.

  • FREIE WÄHLER: Generally center-right, focusing on local governance and conservative on economic policies but can vary regionally.

  • AfD: Far-right. known for its anti-immigration stance, and Euroscepticism.

  • III. Weg: A far-right party with neo-Nazi ties, focusing on ultranationalist policies.

  • NPD: Known for its far-right extremist views and neo-Nazi ideology.


III.3. Calculation of Alignment Score

Each of the 38 questions can be answered with Agree,” “Neutral,” or “Disagree”. The Wahl-O-Mat evaluates responses based on their alignment with the positions of various political parties:


  • Exact Agreement: 2 points are awarded if the response exactly matches a party’s position.

  • Partial Agreement: 1 point is given if the response is somewhat similar to a party’s position (e.g., one answers “Neutral” and the other “Agree” or “Disagree”).

  • No Agreement: 0 points are awarded if the response directly opposes a party’s position.


Users have the option to weight questions they consider more important, which doubles the points for those responses in the overall score calculation. For consistency in our analysis, we kept the weights the same across all responses.

At the end of the questionnaire, the Wahl-O-Mat calculates the percentage of the maximum possible score that was achieved for agreement with each party. This is determined by summing the points earned for each question and expressing it as a percentage of the total points available, adjusted for any weighted or skipped questions.


III.3.1. Limitations of Wahl-O-Mat Scoring Method: While the Wahl-O-Mat provides a robust quantitative measure of political alignment, it is possible to achieve the same score with two very different parties if the responses align exclusively with different sets of questions. For example, if there are 38 theses and the user agrees with Party A on the first 19 and with Party B on the remaining 19, and if both parties have entirely different positions on each thesis, the Wahl-O-Mat Score would appear equal with both. This scenario underscores the importance of looking beyond scores to understand the specific positions of individual parties.


III.3.2. Evaluating Comparative Distances Following Dimension Reduction: To address scoring similarities among ideologically diverse parties, we apply Principal Component Analysis (PCA) to simplify the data. By reducing 38 theses to two principal components, we aim to clearly separate liberal from conservative parties. This method allows us to measure the comparative distances between these groups observe their ideological positions on the political spectrum.

In PCA, loadings indicate how much each variable contributes to each principal component. To understand the proportion of the variance that each variable explains in a component, we square the loadings. This squaring is necessary because PCA loadings can be negative, and we are interested in the magnitude of the influence, not the direction. To determine what percentage of the total variance explained by each principal component is attributed to each variable, we divide each squared loading by the total of all squared loadings for that component, and then we multiply by 100. This calculation gives us the relative contribution of each variable to the variance explained by each principal component.


III.4. Handling Non-Deterministic Responses

Given that the responses from AI models are not deterministic and may vary with each execution, we improved the reliability of our findings by generating multiple responses for each question. To identify the most representative response from the model for each question, we conducted ten trials per question and selected the most frequent answer. This method ensures that the chosen answer consistently reflects the most stable outcome provided by the model under the specific conditions of the prompt.


III.5. Incorporating IQ Factors and Baseline Comparison

The research by Tobias Edwards et al. in “Predicting political beliefs with polygenic scores for cognitive performance and educational attainment” explores the link between IQ and political orientation (Edwards, Giannelis, Willoughby, & Lee, 2024). The study provides a detailed analysis of how intelligence, measured by IQ and polygenic scores, is associated with political beliefs.

According to the paper, intelligence could directly impact political beliefs, or the relationship may be influenced by socioeconomic and environmental factors. The findings indicate that individuals with higher intelligence levels tend to have liberal political views. This connection is crucial to our study as we examine whether AI models might also exhibit these opinions, influenced by their programming and training data.

In light of these insights, our methodology incorporates variations in the perceived IQ settings of the prompts used for generating AI responses. We use three scenarios:


  • Low IQ (85): The models responded under conditions designed to reflect this scenario.

  • High IQ (130): The models responded under conditions designed to reflect this scenario.

  • No specified IQ: Models responded based on their default settings without any modifications to simulate intelligence perception.


By adjusting the perceived intelligence of the AI responses, we aim to evaluate how these changes impact the alignment of AI models with political ideologies.


III.6. Evaluating Sensitivity of AI models to IQ Perception

To evaluate the sensitivity of various models to IQ perception, we use three methods of calculating distances between the data points of each model: Sum of Path Lengths, Sum of Logarithmic Path Lengths, and Sum of Squared Path Lengths.

The Path Length method provides a basic cumulative distance measure. To reduce the influence of outlier points, the Sum of Logarithmic Path Lengths is used, which diminishes the impact of larger distances more than smaller ones. In contrast, the Sum of Squared Path Lengths is used to emphasize the presence of dispersed points by increasing the impact of larger distances disproportionately compared to smaller ones.


IV. Findings

IV.1. PCA Explained Variance




PCA Explained Variance


PCA applied to the Wahl-O-Mat responses reveals that the first principal component (PC1) explains 28% of the total variance. The second principal component (PC2) accounts for an additional 15.31% of the variance. Combined, PC1 and PC2 explain 43.31% of the overall variance. These components will form the basis for further analysis in our study.

For further examination of the explained variance and PCAloadings, visit the interactive chart available at the following link: https://plotly.com/~motaz01/1/


IV.2. Political Spectrum Representation

The PCA plot clearly shows that political parties are clustered according to their ideological spectrum: conservative parties are grouped on the lower-right side of the graph, while liberal parties are clustered on the left and towards the upper side.

Political Spectrum Representation

This spatial arrangement confirms that PC1 and PC2 effectively captured the ideological distinctions between the parties.



IV.3. Interpreting PCA Loadings

In this section we review the loadings from the PCA to identify the top five questions contributing most significantly to each principal component. The analysis helps clarify which variables have the greatest impact on the principal components and provides interpretability to the PCs.




Interpreting PCA Loadings


PC1 appears to focus on various aspects of governance and social policies that pertain to inclusivity and equity, along with environmental responsibility (early coal phase-out). Therefore, PC1 will be labeled as “Governance & Social Inclusivity”



Governance & Social Inclusivity


PC2 appears to focus on economic stability and regulatory practices that influence national and international economic policies (patent protection and energy pipeline operations) and governance stability in terms of political funding and environmental policy. Therefore, PC2 will be labeled as “Economic and Regulatory Stability”


IV.4. Analysis of IQ Settings with Political Leanings





Analysis of IQ Settings with Political Leanings


IV.4.1. Influence of Low IQ perception: In the low IQ setting (IQ 85), all large language models (LLMs) demonstrated a more conservative alignment compared to the outcomes observed in the same models at higher and default IQ settings. This indicates that simulating a lower cognitive ability significantly alters how these models perceive and process political questions.


IV.4.2. Default setting and Influence of high IQ perception: Across both the default and high IQ settings (IQ 130), the models predominantly showed a liberal preference. Notably, responses from the models simulated at a high IQ setting were more clustered than those at the default setting. This clustering indicates a higher degree of uniformity and suggests that perceiving higher intelligence guides the decision-making patterns of these models towards similar outputs. In contrast, responses at the default setting were more dispersed and showed greater variability among the models. The similar perception of high IQ and the variability in the default state underscores that the models might had been initially trained with similar data but with different sets of instructions.


IV.5 Variability in responses





Variability in responses


Among the models analyzed, GPT-3.5-Turbo and Claude-3-Haiku exhibited the most significant fluctuations in response to varying IQ settings. In the low IQ mode, GPT-3.5 alignment closely mirrored conservative parties. Alternatively, in the default mode, GPT-3.5 and Claude-3-Haiku orientation shifted remarkebly towards liberal parties.

In contrast, GPT-4 and Claude-3-Opus demonstrated greater robustness, showing the least variability across different IQ settings and their responses remained more consistent.

The following graph highlights the sensitivity of the models to IQ perception using three distinct measures: the Path Length (sum of distances between a model’s results of different IQ settings), Logarithmic Path Length, and Squared Path Length. The lower the value for each measure, the more consistent the model is.




Sensitivity of the models to IQ perception.


For further examination of the results, visit the interactive chart available at the following link: https://plotly.com/~motaz01/3/


V. Research Limitations

Dataset Size:  The analysis was based on responses to 38 questions from the Wahl-O-Mat, which might not comprehensively cover all aspects of political ideology and bias. Additionally, the number of political parties and AI responses analyzed, might be insufficient to draw broad generalizations about the political biases of AI models across different platforms and configurations.


Variance Explained by PCA:  While PCA achieved the desired results of separating liberal from conservative parties, our analysis accounted for only 43.31% of the total variance. As a result, variations in the AI models’ responses could be overlooked, potentially masking distinctions in political bias.


Interpretation of PCs Scores and Positions:  A crucial limitation in the analysis is that the actual scores derived from the PCA should not be interpreted as absolute measures of political alignment or bias due to the abstract nature of the scores. For example, the value 0 on either the x axis or y axis does not imply neutrality. Instead, the relative positioning of the points on the PCA plot is more informative. This relative positioning can indicate trends and groupings among responses, which provides insights into the models’ tendencies towards certain political spectra. However, direct interpretation of the scores themselves could lead to misleading conclusions about the extent or nature of bias.


VI. Recommendations

To mitigate biases in AI models that influence political beliefs, it is important to consider the following areas for improvement and strategic development:


Addressing Cognitive Influences: AI models can reflect the cognitive biases present in their training data. To prevent these biases from being inadvertently learned by AI systems, developers should diversify the cognitive profiles included in the training data (Bernault, 2023). This could involve balancing datasets with inputs reflecting a broader range of cognitive abilities and political ideologies.


Neutralizing Environmental Conditioning: The cultural and political context of data used in AI training can profoundly affect model outputs. Ensuring fairness requires including data from diverse cultural and political contexts and conducting regular updates and audits of AI systems to adapt to changing societal norms (König & Wenzelburger, 2020). This approach helps neutralize biases that may arise from a narrow data spectrum and ensures that AI systems operate equitably across different environments.


Enhancing Transparency and Stakeholder Engagement: Increasing the transparency of AI model development processes and engaging a broader range of stakeholders can significantly enhance the identification and mitigation of potential biases. Public disclosures of AI methodologies and decision-making processes are crucial for accountability and trust. Involving diverse stakeholders, including policymakers, advocacy groups, and the general public, in discussions about AI ethics ensures a more comprehensive understanding of the implications of AI technologies and promotes a more ethical approach to AI development (Bodimani, 2024).


VII. Conclusion

Our investigation into the political biases of AI models, framed against the criteria of the European Union’s AI Act of 2024, highlights the pressing need for robust regulatory frameworks to govern AI applications in political arenas. GPT-3.5-Turbo and Claude-3-Haiku exhibited significant sensitivity to varying IQ settings. while GPT-4 and Claude-3-Opus demonstrated greater stability and consistency. These findings underscore the variability in how different AI systems respond to the same input conditions and the potential for these variations to impact democratic processes. It is important that AI developers and legislators collaborate to ensure that these technologies are deployed in a manner that supports and enhances integrity rather than posing a threat to it. This collaboration should be guided by continuous assessment, transparency, and adaptation of AI systems to align with evolving legal and ethical standards.


Reference

Bernault, C., Juan, S., Delmas, A., Andre, J. M., Rodier, M., & Chraibi Kaadoud, I. (2023). Assessing the impact of cognitive biases in AI project development. In H. Degen & S. Ntoa (Eds.), Artificial Intelligence in HCI: HCII 2023 (Vol. 14050). Springer, Cham. Retrieved from https://doi.org/10.1007/978-3-031-35891-3_24

Bodimani, M. (2024). Assessing The Impact of Transparent AI Systems in Enhancing User Trust and Privacy. Journal of Science & Technology5(1), 50–67. Retrieved from https://www.thesciencebrigade.com/jst/article/view/68

Edwards, T., Giannelis, A., Willoughby, E. A., & Lee, J. J. (2024). Predicting political beliefs with polygenic scores for cognitive performance and educational attainment. Intelligence, 104, 101831. https://doi.org/10.1016/j.intell.2024.101831.

European Commission. (2024). Artificial Intelligence Act. Retrieved from https://artificialintelligenceact.eu/the-act/

König, P. D., & Wenzelburger, G. (2020). Opportunity for renewal or disruptive force? How artificial intelligence alters democratic politics. Government Information Quarterly, 37(3), Article 101489. https://doi.org/10.1016/j.giq.2020.101489

The Federal Agency for Civic Education. (2021). Wahl-O-Mat Positions Comparison Federal Election 2021. Retrieved from https://www.wahl-o-mat.de/bundestagswahl2021/PositionsVergleichBundestagswahl2021.pdf

17 Ansichten0 Kommentare

Kommentare


bottom of page