This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

Last November, Apple ran into trouble after customers pointed out on Twitter that its credit card service was discriminating against women. David Heinemeir Hansson, the creator of Ruby on Rails, called Apple Card a sexist program. “Apple’s black box algorithm thinks I deserve 20x the credit limit [my wife] does,” he tweeted.

The success of deep learning in the past decade has increased interest in the field of artificial intelligence. But the rising popularity of AI has also highlighted some of the key problems of the field, including the “black box problem,” the challenge of making sense of the way complex machine learning algorithms make decisions. The Apple Card disaster is one of many manifestations of the black-box problem coming to light in the past years.

The increased attention to black-box machine learning has given rise to a body of research on explainable AI. And a lot of the work done in the field involves developing techniques that try to explain the decision made by a machine learning algorithm without breaking open the black box. But explaining AI decisions after they happen can have dangerous implications, argues Cynthia Rudin, professor of computer science at Duke University, in a paper published in the Nature Machine Intelligence journal.

“Rather than trying to create models that are inherently interpretable, there has been a recent explosion of work on ‘explainable ML’, where a second (post hoc) model is created to explain the first black box model. This is problematic. Explanations are often not reliable,” Rudin writes. and can be misleading, as we discuss below.

Such practices can “potentially cause great harm to society,” Rudin warns, especially in critical domains such as healthcare and criminal justice.

Instead, developers should opt for AI models that are “inherently interpretable” and “provide their own explanations” Rudin discusses in her paper. And contrary to what some AI researchers believe, in many cases, interpretable models can produce results that are just as accurate as black-box deep learning algorithms.

Two types of black-box AI

Like many things involving artificial intelligence, there’s a bit of confusion surrounding the black-box problem. Rudin differentiates between two types of black-box AI systems: functions that are too complicated for any human to comprehend, and functions that are proprietary.

The first kind of black-box AI includes deep neural networks, the architecture used in deep learning algorithms. DNNs are composed of layers upon layers of interconnected variables that become tuned as the network is trained on numerous examples. As neural networks grow larger and larger, it becomes virtually impossible to trace how their millions (and sometimes, billions) of parameters combine to make decisions. Even when AI engineers have access to those parameters, they won’t be able to precisely deconstruct the decisions of the neural network.

deep neural networksDeep neural networks are composed of several stacked layers of artificial neurons

The second type of black-box AI, the proprietary algorithms, is a reference to companies who hide the details of their AI systems for various reasons, such as intellectual property or preventing bad actors from gaming the system. In this case, the persons who created the AI system might have knowledge of its inner logic, but the people who use them don’t. We interact will all kinds of black-box AI systems every day, including Google Search’s ranking algorithm, Amazon’s recommendation system, Facebook’s Newsfeed, and more. But the more dangerous ones are those that are being used to hand out prison sentences, determine credit scores, and make treatment decisions in hospitals.

While a large part of Rudin’s paper addresses the dangers of neural network black boxes, she also discusses the implications of walled-garden systems that keep their details to themselves.

Explainability vs interpretability

We need to get one more thing out of the way before we dive deeper into the discussion. Most mainstream media outlets covering AI research use the terms “explainable AI” and “interpretable AI” interchangeably. But there’s a fundamental difference between the two.

Interpretable AI are algorithms that gives a clear explanation of their decision-making processes. Many machine learning algorithms are interpretable. For instance, decision trees and linear regression models describe associate coefficients to each of the features of their input data. You can clearly trace the path that your input data takes when it goes through the AI model.

decision treeDecision trees provide clear explanations of their reasoning process (source: Wikipedia)

In contrast, explainable AI are tools that apply to algorithms that don’t provide a clear explanation of their decisions. Researchers, developers, and users rely on these auxiliary tools and techniques to make sense of the logic used in black-box AI models. For instance, in deep learning–based image classifiers, researchers develop models that create saliency maps that highlight the pixels in the input image that contributed to the AI’s output.

But the explanation model does not necessarily provide a breakdown of the inner logic of the AI algorithm it investigates. “Explanation here refers to an understanding of how a model works, as opposed to an explanation of how the world works,” Rudin writes in her paper.

#blog #ai research papers #deep learning

The dangers of trusting black-box machine learning
1.45 GEEK