Ethical Considerations for Data Science: Prejudice and Bias

The explosion of the data science field has created many opportunities to conduct predictive analysis through the use of advanced machine and deep learning models. Since these approaches are quantitative in nature, sometimes we take data driven insights as facts, but fail to recognize that not everything is so black and white. There’s significant subjectivity in terms of the data we model, the results of the modeling and our own interpretation of said results. Just as humans are prone to bias, artificial intelligence is too. Bias in data modeling occurs when the underlying data reflects the current social biases we hold as humans. When we train a model to learn off of this data, it can end up learning the same prejudices from the data which in turn can lead to biased results and predictions. The website AI Multiple highlights two types of bias in machine learning algorithms and why they occur. They include:

  • Cognitive biases: Effective feelings towards a person or group based on their perceived group membership, which can be introduced either unknowingly by the trainers of the model or through the use of data that also contains the biases.
  • Lack of complete data: Incomplete data that may not be representative that can be used to make assumptions/predictions about the whole population.

Examples of Bias in Machine Learning

The best way to understand the concept of bias in data science is by looking at a couple examples of this in real life. It’s important to note that many of these examples come from some of the world’s biggest companies and organizations, meaning that the impact of prejudice in their data science initiatives can be far reaching.

  • Amazon: An interesting example of biased modeling was when Amazon developed an AI based recruitment system. Since the algorithm was trained off data that represented existing hiring practices (which traditionally have not favored women), the model replicated these biases and in turn unfairly marked down female candidates. For example, a candidate that may have held a title such as “Women’s Chess Captain” would be scored lower as the algorithm “learned” that men were preferable for most roles. As a result, Amazon had to ditch the entire model.
  • COMPAS (Correctional Offender Management Profiling for Alternative Sanctions): One of the most infamous cases of AI bias happened when an algorithm was used to predict the likelihood of a criminal reoffending. The algorithm incorrectly predicted that black offenders were significantly more likely to reoffend than white (45% to 23% respectively), despite reality showing that this was not the case. The software received sharp criticism from Propublica who analyzed the COMPAS system and determined that “it is no better than random, untrained people on the internet“.
  • Giggle: Giggle was an Australian social media app designed for girls to chat in small groups. The app used AI to identify if new users were a girl or not, but became problematic when it was misclassifying women that did not have traditionally looking female features or were transgendered. While the app was eventually launched, it faced significant criticism from activist groups.

What Can We do to Combat this?

It might be easier to identify bias than to prevent it, however, Paul Barba, a chief scientist at Lexalytics offers some very practical ways to counter bias in machine learning models. Some of these include:

  • Anonymization: This involves removing certain identifiable characteristics of certain groups from the dataset. An example could be removing or randomizing names and the removal of gendered pronouns from documents.
  • Adversarial classifiers: This classifier essentially works in tandem with the main classifier to penalize the algorithm if certain identifiable characteristics (race, gender) are able to be identified from the predictions. The main model’s weights are adjusted until the secondary classifier consistently fails to identify the characteristic of interest.
  • Data Cleaning and Human Exploration: There’s no substitute for human perception. Allowing employees to play with the data, identify possible problematic points, and basing training off of biased examples can help data scientists be more mindful of prejudice. When biased records are identified, they can then be removed or corrected to create a more representative dataset.

While sometimes our prejudices are difficult to identify, its important to be mindful of these issues and extra vigilant. Given the power that comes with Big Data, it’s important to make sure as we delve into this new field, we do so responsibly.


Built In:





Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

The Benefits of Learning a Data Science Course in Delhi

Data Science Course in Delhi


Data Preprocessing with Numpy and Pandas

What is “breakfast”?

Exploration of ACO and Natural Computing Evaluation

Neat data preprocessing with Pipeline and ColumnTransformer

How lacking data affect your decisions in real life.

How to prepare for Google Cloud certified Professional Data Engineer exam

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sarah Wright

Sarah Wright

More from Medium

The Concept of Modeling

What are Different Types Of Descriptive Statistics

Jim Rohn’s Your Best Year Ever