1677540900
The AI Explainability 360 toolkit is an open-source library that supports interpretability and explainability of datasets and machine learning models. The AI Explainability 360 Python package includes a comprehensive set of algorithms that cover different dimensions of explanations along with proxy explainability metrics.
The AI Explainability 360 interactive experience provides a gentle introduction to the concepts and capabilities by walking through an example use case for different consumer personas. The tutorials and example notebooks offer a deeper, data scientist-oriented introduction. The complete API is also available.
There is no single approach to explainability that works best. There are many ways to explain: data vs. model, directly interpretable vs. post hoc explanation, local vs. global, etc. It may therefore be confusing to figure out which algorithms are most appropriate for a given use case. To help, we have created some guidance material and a chart that can be consulted.
We have developed the package with extensibility in mind. This library is still in development. We encourage you to contribute your explainability algorithms, metrics, and use cases. To get started as a contributor, please join the AI Explainability 360 Community on Slack by requesting an invitation here. Please review the instructions to contribute code and python notebooks here.
Supported Configurations:
OS | Python version |
---|---|
macOS | 3.6 |
Ubuntu | 3.6 |
Windows | 3.6 |
AI Explainability 360 requires specific versions of many Python packages which may conflict with other projects on your system. A virtual environment manager is strongly recommended to ensure dependencies may be installed safely. If you have trouble installing the toolkit, try this first.
Conda is recommended for all configurations though Virtualenv is generally interchangeable for our purposes. Miniconda is sufficient (see the difference between Anaconda and Miniconda if you are curious) and can be installed from here if you do not already have it.
Then, to create a new Python 3.6 environment, run:
conda create --name aix360 python=3.6
conda activate aix360
The shell should now look like (aix360) $
. To deactivate the environment, run:
(aix360)$ conda deactivate
The prompt will return back to $
or (base)$
.
Note: Older versions of conda may use source activate aix360
and source deactivate
(activate aix360
and deactivate
on Windows).
Clone the latest version of this repository:
(aix360)$ git clone https://github.com/Trusted-AI/AIX360
If you'd like to run the examples and tutorial notebooks, download the datasets now and place them in their respective folders as described in aix360/data/README.md.
Then, navigate to the root directory of the project which contains setup.py
file and run:
(aix360)$ pip install -e .
If you face any issues, please try upgrading pip and setuptools and uninstall any previous versions of aix360 before attempting the above step again.
(aix360)$ pip install --upgrade pip setuptools
(aix360)$ pip uninstall aix360
AIX360
directory build the container image from Dockerfile using docker build -t aix360_docker .
docker run -it -p 8888:8888 aix360_docker:latest bash
assuming port 8888 is free on your machine.jupyter lab --allow-root --ip 0.0.0.0 --port 8888 --no-browser
localhost:8888
If you would like to quickly start using the AI explainability 360 toolkit without cloning this repository, then you can install the aix360 pypi package as follows.
(your environment)$ pip install aix360
If you follow this approach, you may need to download the notebooks in the examples folder separately.
The examples
directory contains a diverse collection of jupyter notebooks that use AI Explainability 360 in various ways. Both examples and tutorial notebooks illustrate working code using the toolkit. Tutorials provide additional discussion that walks the user through the various steps of the notebook. See the details about tutorials and examples here.
If you are using AI Explainability 360 for your work, we encourage you to
@misc{aix360-sept-2019,
title = "One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques",
author = {Vijay Arya and Rachel K. E. Bellamy and Pin-Yu Chen and Amit Dhurandhar and Michael Hind
and Samuel C. Hoffman and Stephanie Houde and Q. Vera Liao and Ronny Luss and Aleksandra Mojsilovi\'c
and Sami Mourad and Pablo Pedemonte and Ramya Raghavendra and John Richards and Prasanna Sattigeri
and Karthikeyan Shanmugam and Moninder Singh and Kush R. Varshney and Dennis Wei and Yunfeng Zhang},
month = sept,
year = {2019},
url = {https://arxiv.org/abs/1909.03012}
}
Put a star on this repository.
Share your success stories with us and others in the AI Explainability 360 Community.
AIX360 is built with the help of several open source packages. All of these are listed in setup.py and some of these include:
Author: Trusted-AI
Source Code: https://github.com/Trusted-AI/AIX360
License: Apache-2.0 license
#machinelearning #python #deeplearning #artificialintelligence
1620466520
If you accumulate data on which you base your decision-making as an organization, you should probably think about your data architecture and possible best practices.
If you accumulate data on which you base your decision-making as an organization, you most probably need to think about your data architecture and consider possible best practices. Gaining a competitive edge, remaining customer-centric to the greatest extent possible, and streamlining processes to get on-the-button outcomes can all be traced back to an organization’s capacity to build a future-ready data architecture.
In what follows, we offer a short overview of the overarching capabilities of data architecture. These include user-centricity, elasticity, robustness, and the capacity to ensure the seamless flow of data at all times. Added to these are automation enablement, plus security and data governance considerations. These points from our checklist for what we perceive to be an anticipatory analytics ecosystem.
#big data #data science #big data analytics #data analysis #data architecture #data transformation #data platform #data strategy #cloud data platform #data acquisition
1620629020
The opportunities big data offers also come with very real challenges that many organizations are facing today. Often, it’s finding the most cost-effective, scalable way to store and process boundless volumes of data in multiple formats that come from a growing number of sources. Then organizations need the analytical capabilities and flexibility to turn this data into insights that can meet their specific business objectives.
This Refcard dives into how a data lake helps tackle these challenges at both ends — from its enhanced architecture that’s designed for efficient data ingestion, storage, and management to its advanced analytics functionality and performance flexibility. You’ll also explore key benefits and common use cases.
As technology continues to evolve with new data sources, such as IoT sensors and social media churning out large volumes of data, there has never been a better time to discuss the possibilities and challenges of managing such data for varying analytical insights. In this Refcard, we dig deep into how data lakes solve the problem of storing and processing enormous amounts of data. While doing so, we also explore the benefits of data lakes, their use cases, and how they differ from data warehouses (DWHs).
This is a preview of the Getting Started With Data Lakes Refcard. To read the entire Refcard, please download the PDF from the link above.
#big data #data analytics #data analysis #business analytics #data warehouse #data storage #data lake #data lake architecture #data lake governance #data lake management
1617649020
Lots of people have increasing volumes of data and are trying to run data management programs to better sort it. Interestingly, people’s problems are pretty much the same throughout different sectors of any industry, and data management helps them configure solutions.
The fundamentals of enterprise data management (EDM), which one uses to tackle these kinds of initiatives, are the same whether one is in the health sector, a telco travel company, or a government agency, and more! Therefore, the fundamental practices that one needs to follow to manage data are similar from one industry to another.
For example, suppose you’re about to set off and design a program. In this case, it may be your integration platform project or your big warehouse project; however, the principles for designing that program of work is pretty much the same regardless of the actual details of the project.
#big data #bigdata #big data analytics #data management #data modeling #data governance #enterprise data #enterprise data management #edm
1593972060
Critique of pure interpretation
The scientific method as the tool that has served us to find explanations about how things work and make decisions, brought us the biggest challenge that until 2020 we have probably still not overcome: Giving useful narratives to numbers. Also known as “interpretation”.
Just as a matter of clarification, the scientific method is the pipeline of finding evidence to prove or disprove hypotheses. Science and how things work is everything, from natural sciences to economic sciences. But most delightful, by evidence, not only we mean, but humanity understands “data”. And data cannot be anything less than numbers.
Leaving space for generality, the problem of interpretation is particularly entertaining along the path of statistical analyses within the scientific method pipeline. This means finding models that are written in a mathematical language and finding an interpretation for them within the context that delivered the data.
Interpreting a model has two crucial implications that many scientists or science technicians have for long skipped (hopefully not forgotten). The first one relies on the fact that if there is a model to interpret now, there must have been a research question asked before in a context that delivered data to build such a model. The second one is that the narratives we need to create about our model can do much more by expressing ideas about a number within the context of the research question rather than purely inside the model. After all and until 2020, the decisions are made by humans based on the meaning of those numbers, not really by computers. And this last statement is important, because in the 21st century we might actually get to the point that computers take over us in many tasks and they might end up making decisions for us. For this they will need to communicate those decisions among their network. Just then, the human narratives will not count since computers only understand numbers.
As statisticians, we have been adopting the practice of finding problems to solve, finding questions to answer and answers to explain using available data. This mindset has kept us running on a circle of non-sense narratives and interpretations because problems are not found or looked for. Problems and questions emerge from all ongoing interactions and reactions of different phenomena. This fact implies that statistical models and/or other analytical approaches are tools to be used upon the core central problem or question, they are not the spine.
This ugly art of fitting a linear regression on some data and saying that “the beta coefficient is the amount of units that y increases when x increases one unit”, or the art of calculating an average and saying that “it is the value around which we can find the majority of the data points” is a ruthless product that we statisticians have been offering to the scientific method.
The bubble of interpretation
Teaching statistics has made clear for us that people can perfectly understand the way the models work and how to train them to get the numbers. However, what we still did not digest is the fact that out of all the numbers that are produced when training models, most are simply noncommunicable for non statistical people. Let us present some of these numbers whose communication is dark:
#interpretation #model-interpretability #semantics #data #data-analysis #data analysis
1600012800
The first step is to understand what is data governance. Data Governance is an overloaded term and means different things to different people. It has been helpful to define Data Governance based on the outcomes it is supposed to deliver. In my case, Data Governance is any task required for:
Compliance, Privacy, and Security are different approaches to ensure that data collectors and processors do not gain unregulated insights. It is hard to ensure that the right data governance framework is in place to meet this goal. An interesting example of an unexpected insight is the sequence of events leading to leakage of taxi cab tipping history of celebrities.
#databases #big-data-and-governance #data-lineage #data-governance #what-is-data-governance #data-governance-explained #data-governance-and-privacy #data-governance-problems