How ‘Gender Shades’ Sheds Light on Bias in Machine Learning

Data Protection & GDPR

To captivate and engage my AI and Data Protection students in the critical exploration of AI bias, I’ve unearthed a groundbreaking paper that cuts through the ubiquitous grey of opinion and belief. This landmark research offers a solid foundation for understanding, rather than the everyday commentary we encounter on machine learning biases. A prime example is the insightful paper “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification” by Buolamwini and Gebru (2018). It’s an essential read for anyone delving into the nuanced realms of AI, providing clarity and depth in a field often clouded by subjective viewpoints.

“Gender Shades“ confronts a critical and often overlooked issue in the realm of Artificial Intelligence (AI): the inherent biases in commercial gender classification systems. This is particularly significant given the rapid integration of AI in various facets of society. The research is grounded in the emerging recognition that AI, if not carefully developed and monitored, can perpetuate and exacerbate existing societal biases. Buolamwini and Gebru uniquely focus on the intersectionality of bias, examining how AI systems perform across different combinations of gender and skin type, a perspective that was relatively underexplored in AI research up until their study.

METHODOLOGY

The methodology adopted by Buolamwini and Gebru is both comprehensive and innovative. They evaluate three leading commercial gender classification systems: those developed by IBM, Microsoft, and Face++. To assess the performance of these systems, the authors use the Pilot Parliaments Benchmark (PPB), a dataset they created which includes a balanced representation of genders and skin tones. The PPB consists of 1,270 images of parliamentarians from three African and three European countries, designed to balance the scales in terms of skin type and gender representation, a significant departure from the usual datasets that are skewed towards lighter-skinned individuals. This methodological approach not only provides a more accurate assessment of the AI systems but also sets a new standard for evaluating AI bias.

DETAILED FINDINGS

The results of the study are both revealing and concerning. The gender classification systems exhibited the highest accuracy for lighter-skinned males, with IBM achieving 99.7%, Microsoft 99.4%, and Face++ 99.1%. In stark contrast, the accuracy rates for darker-skinned females were significantly lower: 65.3% for IBM, 71.7% for Microsoft, and 78.7% for Face++. These figures highlight a glaring disparity: while the systems nearly perfected gender classification for lighter-skinned males, they frequently misclassified darker-skinned females.

Further dissecting these results, the study shows that for lighter-skinned females, the accuracy was somewhat better but still not on par with their male counterparts: 92.9% for IBM, 93.6% for Microsoft, and 95.6% for Face++. For darker-skinned males, the systems performed better than for darker-skinned females but still lagged behind lighter-skinned males, with accuracies of 88.0% for IBM, 94.4% for Microsoft, and 96.0% for Face++.

These statistics are alarming as they clearly illustrate systemic biases within these AI systems. The data underscores a significant underrepresentation of darker-skinned females in the AI training processes, leading to these systems being less accurate for this group. This not only raises questions about the fairness and inclusivity of AI systems but also poses real-world consequences, as these technologies are increasingly used in critical domains like security, employment, and law enforcement.

IMPLICATIONS

The implications of Buolamwini and Gebru’s findings extend far beyond the realm of technology into the broader social context. The significant discrepancies in AI performance across different demographics underscore a form of digital discrimination, where certain groups are more likely to be misidentified or excluded by automated systems. This bias in AI can reinforce and amplify existing societal prejudices, leading to a range of negative outcomes, from unfair job screening processes to biased law enforcement practices.

The paper’s findings demand an urgent re-evaluation of how AI systems are trained and deployed. It’s crucial for developers to use diverse datasets that represent the full spectrum of human diversity. Moreover, the findings advocate for increased transparency in AI development. Companies should be required to disclose the performance of their systems across different demographics, enabling users to understand and critique the technology they interact with.

CONCLUSION AND RECOMMENDATIONS

In conclusion, “Gender Shades” by Buolamwini and Gebru is a seminal work in the field of AI ethics, shedding light on the pervasive issue of bias in AI, particularly in gender classification systems. The paper serves as a clarion call for the AI community to adopt more inclusive practices in the development of AI technologies. It emphasises the necessity of diversity, not only in datasets but also in the teams that create and refine these AI systems, to ensure a variety of perspectives are considered.

Based on their findings, Buolamwini and Gebru propose several key recommendations:

1 – Diverse Datasets

The authors emphasise the importance of using diverse datasets that include a balanced representation of all genders and skin tones. This would help in training AI systems that are more accurate and less biased.

2 – Intersectional Analysis

They advocate for intersectional analyses in AI testing. This involves considering multiple axes of identity (such as race, gender, age) simultaneously to understand how overlapping identities impact AI performance.

3 – Transparency and Accountability

The paper calls for increased transparency from companies developing AI technologies. They suggest that companies should disclose the demographics of the datasets used in training their models and the performance accuracy of their systems across different demographic groups.

4 – Inclusive Development Teams

The authors recommend the inclusion of diverse perspectives in AI development teams. This diversity can help in recognizing and addressing potential biases that might not be evident to a more homogenous group.

5 – Regulatory Oversight

Finally, they propose that there should be regulatory oversight to ensure that AI systems are fair and do not perpetuate existing biases.

These recommendations are aimed at guiding the AI community towards the development of more ethical, fair, and inclusive technologies. The paper “Gender Shades” not only exposes critical flaws in current AI systems but also provides a roadmap for addressing these challenges.

The authors’ provide practitioners with evidence and proposals that could lead to more equitable and just AI systems. This paper is a foundational text for those interested in understanding and rectifying AI bias, offering both a rigorous analysis of the problem and a hopeful pathway towards more ethical AI development.

Source:

Buolamwini, J. & Gebru, T. (2018). Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. Proceedings of the 1st Conference on Fairness, Accountability and Transparency in Proceedings of Machine Learning Research 81:77-91 Available from https://proceedings.mlr.press/v81/buolamwini18a.html.

By Nigel Gooding

LLM Information Rights Law & Practice. FBCS

PG Dip Information Rights Law and Practice

PG Cert Data Protection Law and Information Governance

PG Cert Management

How to Report a Data Breach: A Practical Guide

A practical guide to data breach reporting under UK GDPR, covering when you must notify the ICO, how to report a breach (and what to do if you don’t need to), and when affected individuals need to be told. Includes the key steps, timeframes, and documentation requirements to keep your organisation compliant.

Alex Haslam

How to Respond to a Data Breach: A Practical Guide

This blog provides an overview of the practical steps organisations can take to reduce the impact of a data breach once it has been identified. It focuses on the actions that should be taken during the early stages of an incident to contain the breach, protect affected individuals, and meet regulatory requirements.

The article discusses a range of mitigation measures, including contacting unintended recipients of personal data, securing the deletion or recovery of exposed information, isolating compromised systems, and maintaining clear records of actions taken. It also explores the challenges posed by both digital and physical data breaches, highlighting the importance of balancing operational needs with data protection obligations.

Finally, the blog emphasises the value of preparation, explaining how established procedures, communication templates, and predefined response plans can help organisations respond more effectively and demonstrate accountability during a regulatory investigation.

Jack Penaligon

DPAS Data Protection Bulletin – June 2026

The latest data protection news and developments from all around the world.

Alex Haslam

How to Assess a Data Breach: A Practical Guide

This blog explains how to assess a data breach by identifying its cause, determining what information was exposed, and evaluating the potential impact on affected individuals and the organisation. It outlines common causes of breaches, the importance of understanding the type and scale of compromised data, and how assessing the timeline of an incident can help businesses respond effectively, meet legal obligations, and reduce long-term risks.

Noah de Wild

How ‘Gender Shades’ Sheds Light on Bias in Machine Learning

METHODOLOGY

DETAILED FINDINGS

IMPLICATIONS

CONCLUSION AND RECOMMENDATIONS

related posts

Get a Free Consultation

Subscribe to our newsletter

QUICK LINKS

WHAT WE DO

COMPANY POLICIES

contact info