Data Science has an extremely mixed reputation when it comes to spreading good in the world.

Marinus Analytics have been using facial and tattoo recognition algorithms to identify and rescue victims of human trafficking. However some of the same image recognition techniques used in law enforcement have been shown to have extremely high rates of false identification amongst women and minority ethnic groups [1].

If even models developed by experts, with computing power far greater than the human brain, contain biases and misunderstandings, then how can we, as humans, hope to be ethically flawless?

The way we work with and learn from data has been developed through multiple empirical approaches. These approaches attempt to lead us to the best quantitatively defined outcomes. So why not apply your data science practices on yourself and see what you learn?

EDA

Let’s start with an acronym! EDA refers to Exploratory Data Analysis, or, in other words, taking a good hard look at what you have before diving in. EDA aims to surface all sorts of interesting findings which may affect the results of your hard work, such as a large skew in your population. In research published by BMC public health [2], it was found that recruiting men into health behaviour research has historically been more difficult than recruiting women. EDA allows us to identify these skews early, to ensure that we don’t develop public health strategies for which women would be the overwhelming beneficiaries.

This cautious approach encourages a data scientist to start by taking a step back and asking “what am I being told” they will be rewarded in time with a more accurate, better calibrated, well understood solution. Or, as I prefer, ‘hasty conclusions lead to crappy solutions’, which is equally applicable to complex human situations.

Overcoming Bias

It’s natural to bring your own experience to your data challenges and to want to test these first. This process of asking questions of our data, typically undertaken during feature engineering, is fundamental in shaping a data science solution. While we may wish to start with our gut feeling, is it important to consider the unusual or unlikely. This ensures that the solution reflects reality, not simply your perception of reality.

Much like a machine learning algorithm is informed purely through past data, our world view is undoubtedly shaped by our own life experiences. Phycologists teach us that not only is it possible to overcome our cognitive biases, but also that when we do, we are able to make better decisions [3].

Collaboration

Coding is stereotyped as a solitary pursuit. The “rubber duck” method even suggests that you problem solve with an inanimate object! However, enlisting the opinion of others can be used, at a bare minimum, to confirm your direction of travel. Academics who are able to co-author with an experienced colleague early in their career have been shown to be more successful [4]. Being presented with new ideas or information can be overwhelming, and confronting these with others facilitates better mutual understanding. Or at the very least, your confusion has some company.

Cross Validation

Through the use of cross-validation, we hope to understand how our solution will perform on an independent dataset. This may highlight unappealing attributes such as overfitting or selection bias, which we would be wise to rectify, consequently it is widely recognised that cross-validating your solution on different folds of data leads to better outcomes. Similarly, in a report by McKinsey, it has been shown that companies with more diverse management boards perform better.

We all have that brilliant friend who will take your side no matter what; however, when you really need to be told to ditch that unsuitable partner/outfit/algorithm choice, you’d really be better off with a diverse pool of perspectives from which to draw upon instead of constantly cross-validating on a homogeneous sample.

#race #data #data-science #ethics #diversity

Five ways Data Science techniques teach us to be better humans
1.25 GEEK