Social, racial, and gender bias in data and models, has come up as a major concern in the machine learning industry and society.

Recently, MIT withdrew from public access to a popular computer vision dataset after a team of researchers found that it was socially biased, tinged with misogynistic and racist labels.

This discovery in Tiny Images, an 80 million images dataset, is the perfect example of how social bias proliferates and spreads into machine learning datasets and projects. In 1985, researchers in linguistic and psychology at Princeton University produced a semantic lexicon for the English language called WordNet that has been largely used for natural language processing tasks. Based on WordNet, MIT scientists released Tiny Images in 2006, a dataset of images compiled from an image search on the internet associated with WordNet words. Since WordNet was originally gender-based and racially biased, so were the Tiny Images labels associated with the collected images.

Just a few weeks ago, Pulse, a generative model for self-supervised photo upsampling, made a lot of noise and concerns over bias in AI models, as a pixelated image of former president of the United States Barack Obama was transformed by Pulse into a high-resolution image of a white man.

#guidelines #trust #ai #ethics #data-science

The AI trust crisis: How to move forward
1.40 GEEK