Royce  Reinger

Royce Reinger

1677540900

AIX360: Interpretability & Explainability Of Data & ML Models

AI Explainability 360 (v0.2.1)

The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datasets and machine learning models. The AI Explainability 360 Python package includes a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics.

The AI Explainability 360 interactive experience provides a gentle introduction to the concepts and capabilities by walking through an example use case for different consumer personas. The tutorials and example notebooks offer a deeper, data scientist-oriented introduction. The complete API is also available.

There is no single approach to explainability that works best. There are many ways to explain: data vs. model, directly interpretable vs. post hoc explanation, local vs. global, etc. It may therefore be confusing to figure out which algorithms are most appropriate for a given use case. To help, we have created some guidance material and a chart that can be consulted.

We have developed the package with extensibility in mind. This library is still in development. We encourage you to contribute your explainability algorithms, metrics, and use cases. To get started as a contributor, please join the AI Explainability 360 Community on Slack by requesting an invitation here. Please review the instructions to contribute code and python notebooks here.

Supported explainability algorithms

Data explanation

Local post-hoc explanation

Local direct explanation

Global direct explanation

Global post-hoc explanation

Supported explainability metrics

Setup

Supported Configurations:

OSPython version
macOS3.6
Ubuntu3.6
Windows3.6

(Optional) Create a virtual environment

AI Explainability 360 requires specific versions of many Python packages which may conflict with other projects on your system. A virtual environment manager is strongly recommended to ensure dependencies may be installed safely. If you have trouble installing the toolkit, try this first.

Conda

Conda is recommended for all configurations though Virtualenv is generally interchangeable for our purposes. Miniconda is sufficient (see the difference between Anaconda and Miniconda if you are curious) and can be installed from here if you do not already have it.

Then, to create a new Python 3.6 environment, run:

conda create --name aix360 python=3.6
conda activate aix360

The shell should now look like (aix360) $. To deactivate the environment, run:

(aix360)$ conda deactivate

The prompt will return back to $ or (base)$.

Note: Older versions of conda may use source activate aix360 and source deactivate (activate aix360 and deactivate on Windows).

Installation

Clone the latest version of this repository:

(aix360)$ git clone https://github.com/Trusted-AI/AIX360

If you'd like to run the examples and tutorial notebooks, download the datasets now and place them in their respective folders as described in aix360/data/README.md.

Then, navigate to the root directory of the project which contains setup.py file and run:

(aix360)$ pip install -e .

If you face any issues, please try upgrading pip and setuptools and uninstall any previous versions of aix360 before attempting the above step again.

(aix360)$ pip install --upgrade pip setuptools
(aix360)$ pip uninstall aix360

Running in Docker

  • Under AIX360 directory build the container image from Dockerfile using docker build -t aix360_docker .
  • Start the container image using command docker run -it -p 8888:8888 aix360_docker:latest bash assuming port 8888 is free on your machine.
  • Inside the container start jupuyter lab using command jupyter lab --allow-root --ip 0.0.0.0 --port 8888 --no-browser
  • Access the sample tutorials on your machine using URL localhost:8888

PIP Installation of AI Explainability 360

If you would like to quickly start using the AI explainability 360 toolkit without cloning this repository, then you can install the aix360 pypi package as follows.

(your environment)$ pip install aix360

If you follow this approach, you may need to download the notebooks in the examples folder separately.

Using AI Explainability 360

The examples directory contains a diverse collection of jupyter notebooks that use AI Explainability 360 in various ways. Both examples and tutorial notebooks illustrate working code using the toolkit. Tutorials provide additional discussion that walks the user through the various steps of the notebook. See the details about tutorials and examples here.

Citing AI Explainability 360

If you are using AI Explainability 360 for your work, we encourage you to

  • Cite the following paper. The bibtex entry is as follows:
@misc{aix360-sept-2019,
title = "One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques",
author = {Vijay Arya and Rachel K. E. Bellamy and Pin-Yu Chen and Amit Dhurandhar and Michael Hind
and Samuel C. Hoffman and Stephanie Houde and Q. Vera Liao and Ronny Luss and Aleksandra Mojsilovi\'c
and Sami Mourad and Pablo Pedemonte and Ramya Raghavendra and John Richards and Prasanna Sattigeri
and Karthikeyan Shanmugam and Moninder Singh and Kush R. Varshney and Dennis Wei and Yunfeng Zhang},
month = sept,
year = {2019},
url = {https://arxiv.org/abs/1909.03012}
}

Put a star on this repository.

Share your success stories with us and others in the AI Explainability 360 Community.

AIX360 Videos

  • Introductory video to AI Explainability 360 by Vijay Arya and Amit Dhurandhar, September 5, 2019 (35 mins)

Acknowledgements

AIX360 is built with the help of several open source packages. All of these are listed in setup.py and some of these include:

Download Details:

Author: Trusted-AI
Source Code: https://github.com/Trusted-AI/AIX360 
License: Apache-2.0 license

#machinelearning #python #deeplearning #artificialintelligence 

What is GEEK

Buddha Community

AIX360: Interpretability & Explainability Of Data & ML Models
 iOS App Dev

iOS App Dev

1620466520

Your Data Architecture: Simple Best Practices for Your Data Strategy

If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.

If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.

In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.

#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition

Gerhard  Brink

Gerhard Brink

1620629020

Getting Started With Data Lakes

Frameworks for Efficient Enterprise Analytics

The opportunities big data offers also come with very real challenges that many organizations are facing today. Often, it’s finding the most cost-effective, scalable way to store and process boundless volumes of data in multiple formats that come from a growing number of sources. Then organizations need the analytical capabilities and flexibility to turn this data into insights that can meet their specific business objectives.

This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that’s designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You’ll also explore key benefits and common use cases.

Introduction

As technology continues to evolve with new data sources, such as IoT sensors and social media churning out large volumes of data, there has never been a better time to discuss the possibilities and challenges of managing such data for varying analytical insights. In this Refcard, we dig deep into how data lakes solve the problem of storing and processing enormous amounts of data. While doing so, we also explore the benefits of data lakes, their use cases, and how they differ from data warehouses (DWHs).


This is a preview of the Getting Started With Data Lakes Refcard. To read the entire Refcard, please download the PDF from the link above.

#big data #data analytics #data analysis #business analytics #data warehouse #data storage #data lake #data lake architecture #data lake governance #data lake management

Enterprise Data Management: Stick to the Basics

Lots of people have increasing volumes of data and are trying to run data management programs to better sort it. Interestingly, people’s problems are pretty much the same throughout different sectors of any industry, and data management helps them configure solutions.

The fundamentals of enterprise data management (EDM), which one uses to tackle these kinds of initiatives, are the same whether one is in the health sector, a telco travel company, or a government agency, and more! Therefore, the fundamental practices that one needs to follow to manage data are similar from one industry to another.

For example, suppose you’re about to set off and design a program. In this case, it may be your integration platform project or your big warehouse project; however, the principles for designing that program of work is pretty much the same regardless of the actual details of the project.

#big data #bigdata #big data analytics #data management #data modeling #data governance #enterprise data #enterprise data management #edm

Vern  Greenholt

Vern Greenholt

1593972060

Interpreting the model is for humans, not for computers

Critique of pure interpretation
The scientific method as the tool that has served us to find explanations about how things work and make decisions, brought us the biggest challenge that until 2020 we have probably still not overcome: Giving useful narratives to numbers. Also known as “interpretation”.
Just as a matter of clarification, the scientific method is the pipeline of finding evidence to prove or disprove hypotheses. Science and how things work is everything, from natural sciences to economic sciences. But most delightful, by evidence, not only we mean, but humanity understands “data”. And data cannot be anything less than numbers.
Leaving space for generality, the problem of interpretation is particularly entertaining along the path of statistical analyses within the scientific method pipeline. This means finding models that are written in a mathematical language and finding an interpretation for them within the context that delivered the data.
Interpreting a model has two crucial implications that many scientists or science technicians have for long skipped (hopefully not forgotten). The first one relies on the fact that if there is a model to interpret now, there must have been a research question asked before in a context that delivered data to build such a model. The second one is that the narratives we need to create about our model can do much more by expressing ideas about a number within the context of the research question rather than purely inside the model. After all and until 2020, the decisions are made by humans based on the meaning of those numbers, not really by computers. And this last statement is important, because in the 21st century we might actually get to the point that computers take over us in many tasks and they might end up making decisions for us. For this they will need to communicate those decisions among their network. Just then, the human narratives will not count since computers only understand numbers.
As statisticians, we have been adopting the practice of finding problems to solve, finding questions to answer and answers to explain using available data. This mindset has kept us running on a circle of non-sense narratives and interpretations because problems are not found or looked for. Problems and questions emerge from all ongoing interactions and reactions of different phenomena. This fact implies that statistical models and/or other analytical approaches are tools to be used upon the core central problem or question, they are not the spine.
This ugly art of fitting a linear regression on some data and saying that “the beta coefficient is the amount of units that y increases when x increases one unit”, or the art of calculating an average and saying that “it is the value around which we can find the majority of the data points” is a ruthless product that we statisticians have been offering to the scientific method.
The bubble of interpretation
Teaching statistics has made clear for us that people can perfectly understand the way the models work and how to train them to get the numbers. However, what we still did not digest is the fact that out of all the numbers that are produced when training models, most are simply noncommunicable for non statistical people. Let us present some of these numbers whose communication is dark:

  • The p-value is one particular concept that may in fact deserve an entire essay on its own. On social media, for instance, we constantly see people asking about an explanation of the p-value and right away there is a storm of statisticians engaging into giving their own interpretation.
  • The odds-ratio stars in this debate. After 10 years of working in this field, we must admit that it has never been possible to explain to an expert in another field how to think about the odds ratio. Even Wikipedia has tried out and, in our opinion, has more than failed at it.
  • The beta coefficients of a dummy variable in a logistic regression are of the same kind. We have the feeling that they are odds-ratio of the categories with respect to the baseline category. But, how can we make it understandable and actionable in practice? This simply we don’t know.
    This problem is central for the scientific and statistical community. The models that we train lose their value because of such a lack of interpretation.
    After some years of talking with colleagues about this problem and looking for an appropriate framework that clears up the problem of interpretation, we had come to one obvious conclusion: the interpretation process exists and can only happen within a given context. It makes no sense to fight for the interpretation process inside the model. The inner processes of the model are all numerical and these results can be communicated and understood only by numerical, statistical people. Making interpretations of numbers inside the models is a bubble of rephrasing. In order to interpret the results of a model so that they become tools for taking actions, it is essential to keep in mind the context from where the data is coming and the research questions are asked.

#interpretation #model-interpretability #semantics #data #data-analysis #data analysis

Mikel  Okuneva

Mikel Okuneva

1600012800

What Exactly Is Data Governance?

The first step is to understand what is data governance. Data Governance is an overloaded term and means different things to different people. It has been helpful to define Data Governance based on the outcomes it is supposed to deliver. In my case, Data Governance is any task required for:

  • Compliance: Data life cycle and usage is in accordance with laws and regulations.
  • Privacy: Protect data as per regulations and user expectations.
  • Security: Data & data infrastructure is adequately protected.

Why is Data Governance hard?

Compliance, Privacy, and Security are different approaches to ensure that data collectors and processors do not gain unregulated insights. It is hard to ensure that the right data governance framework is in place to meet this goal. An interesting example of an unexpected insight is the sequence of events leading to leakage of taxi cab tipping history of celebrities.

#databases #big-data-and-governance #data-lineage #data-governance #what-is-data-governance #data-governance-explained #data-governance-and-privacy #data-governance-problems