Sources of unintended bias in training data

Almost every week, the press highlights examples of machine learning models with biased outputs. With discrimination at the forefront of public discussion, how is social inequality reflected in the biased outputs of ML models? Decisions made at every step of a typical data science pipeline, from formulating questions to collecting data and training and deploying models can ultimately harm downstream users¹. Our goal is to achieve a practical understanding of how different sources of bias can be reflected in the data. To achieve this aim, we’ll build examples using synthetic data to illustrate how different sources of bias impact ML outputs and their underlying characteristics. The guiding principle is that a good way to understand something is to build it yourself!

#data-science #fairness #machine-learning #ai-ethics #bias-in-ai

towardsdatascience.com

Sources of unintended bias in training data