How to Tell if Something’s a Coincidence or a Causality

Most of us regularly make the mistake of unwittingly confusing correlation with causation, a tendency reinforced by media headlines like

https://www.vox.com/2016/4/13/11420230/life-expectancy-income

And I could go on for days. If you believed any one of the above news items, you have fallen prey to a classic statistical fallacy: correlation is not causation.

What are correlation and causation?

● Correlation indicates a relationship between two events. For instance, these two events tend to happen at the same time.

● Causation indicates that the occurrence of one event has caused the occurrence of a second event, which means one event makes the other to happen. These two events also happen at the same time, but there is a causal mechanism! This is also referred to as the cause and effect.

It’s easy to see the problem with that logic in these examples:

  • “After I washed my car, it rained. Therefore washing my car causes rain.”
  • “When I got into the bathtub, the phone rang. Therefore getting in the bath will lead to the phone ringing.”
  • “We won our baseball game when I was wearing these socks, so it must be the lucky socks that caused our win.”

The place where this fallacy shows up the most often is in media headlines, which unfortunately is where most people get their science information and news.[2]

What is GEEK

Buddha Community

How to Tell if Something’s a Coincidence or a Causality

How to Tell if Something’s a Coincidence or a Causality

Most of us regularly make the mistake of unwittingly confusing correlation with causation, a tendency reinforced by media headlines like

https://www.vox.com/2016/4/13/11420230/life-expectancy-income

And I could go on for days. If you believed any one of the above news items, you have fallen prey to a classic statistical fallacy: correlation is not causation.

What are correlation and causation?

● Correlation indicates a relationship between two events. For instance, these two events tend to happen at the same time.

● Causation indicates that the occurrence of one event has caused the occurrence of a second event, which means one event makes the other to happen. These two events also happen at the same time, but there is a causal mechanism! This is also referred to as the cause and effect.

It’s easy to see the problem with that logic in these examples:

  • “After I washed my car, it rained. Therefore washing my car causes rain.”
  • “When I got into the bathtub, the phone rang. Therefore getting in the bath will lead to the phone ringing.”
  • “We won our baseball game when I was wearing these socks, so it must be the lucky socks that caused our win.”

The place where this fallacy shows up the most often is in media headlines, which unfortunately is where most people get their science information and news.[2]

Michael  Hamill

Michael Hamill

1618218720

Yoshua Bengio & Why He Is Bullish About Causal Learning

Recently, Yoshua Bengio and researchers from the University of Montreal, the Max-Planck Institute for Intelligent Systems and Google Research demonstrated how causal representation learning contributes to the robustness and generalisation of machine learning models. The team reviewed the fundamental concepts of causal inference and related them to crucial open problems of machine learning, including transfer and generalisation.

Attaining general intelligence is one of the key goals in machine learning and deep learning. As things stand, the machine learning techniques are limited at some crucial feats where natural intelligence excels. These include transfer to new problems and any form of generalisation.

#opinions #causal learning #causal reasoning #causality in machine learning #generalisation in deep learning #generalisation of machine learning #statistical reasoning #yoshua bengio

Causal inference for data scientists: a skeptical view

Introduction

The purpose of this post is to show why causal inference is hard, how it fails us, and why DAGs don’t help.

Machine learning practitioners are concerned with prediction, rarely with explanation. This is a luxury. We work even on problems for which no data generating process can possibly be written down. Nobody believes that an LSTM model trained on Shakespeare’s plays works because it approximates Shakespeare. Yet it works.

In recent years, new tools for causal inference have become available for a broader audience. These tools promise to help non-experts not only predict, but explain patterns in data. Directed acyclic graphs and do-calculus are among the most influential ones.

People love shiny tools and there is danger in it. New tools come with the rush to adopt them, with a feeling of being in the vanguard, with new opportunities. That often makes them unreliable: they are misused, substituted for a better theory, design, or data.

In this post, I focus on directed acyclic graphs and a Python library DoWhy because DAGs are really popular in the machine learning community. Points I make apply equally to the Potential Outcome framework, or any other formal language to express causality.

Why causal inference is hard, in theory

Causal inference relies on causal assumptions. Assumptions are beliefs that allow movement from statistical associations to causation.

Randomized experiments are the gold standard for causal inference because the treatment assignment is random and physically manipulated: one group gets the treatment, one does not. The assumptions here are straightforward, securable by design, and can be conveniently defended.

When there is no control over treatment assignment, say with observational data, researchers try to model it. Modeling here is equivalent to saying “we assume that after adjusting for age, gender, social status, and smoking, runners and non-runners are so similar to each other as if they were randomly assigned to running.” Then one can regress life expectancy on running, declare that ‘running increases life expectancy by n %”, and call it a day.

The logic behind this approach is clunky. It implicitly assumes we know exactly why people start running or live long and the only piece missing is what we are trying to estimate. A story that is not very believable and a bit circular. Also, by a happy coincidence, all parts of our model have available empirical proxies measured without error. Finally, since there is no principle way to check how well the model of choice approximates the real assignment mechanism all its assumptions can be debated for eternity.

This brings us to a situation best summarized by Jaas Sekhon [1]:

“Without an experiment, natural experiment, a discontinuity, or some other strong design, no amount of econometric or statistical modeling can make the move from correlation to causation persuasive”

Why causal inference is hard, in practice

The concerns raised above are better demonstrated with practical examples. Although there are plenty, I stick to three, with one from economics, epidemiology, and political science.

In 1986 Robert Lalonde showed how econometric procedures do not replicate experimental results. He utilized an experiment where individuals were randomly selected into job programs. Randomization allowed him to estimate a programs’ unbiased effect on earning. He then asked: would we have gotten the same estimate without randomization? To mimic observational data Lalonde constructed several non-experimental control groups. After comparing estimates he concluded that econometric procedures fail to replicate experimental results [2].

Epidemiology has the same problems. Consider the story of HDL cholesterol and heart disease. It was believed that ‘good cholesterol’ protects against coronary heart disease. Researchers even declared observational studies to be robust to covariate adjustments. Yet several years later randomized experiments demonstrated that HDL doesn’t protect your heart. For epidemiology this situation is not unique and many epidemiological findings are later overturned by randomized control studies [3].

Democracy-growth studies were a hot topic in political science back in the day. Researchers put GDP per capita or the like on the left side of their equations, democracy on the right and to avoid being unsophisticated threw a bunch of controls there: life expectancy, educational attainment, population size, foreign direct investment, among others. From 470 estimates presented in 81 papers published prior to 2006, 16% of them showed a statistically significant and negative effect of democracy on growth, 20% negative but insignificant, 38% positive but still insignificant, and 38% of the estimates, and you will be really surprised here, were positive and statistically significant [4].

The pattern is clear: no matter how confident researchers are in their observational research, it does not necessarily bring them closer to the truth.

#machine-learning #causality #causal-inference #data-science #data analysis

Tyrique  Littel

Tyrique Littel

1597672800

Using Granger Causality Test to Know If One Time Series Is Impacting?

Granger causality test is used to determine if one time series will be useful to forecast another variable by investigating causality between two variables in a time series. The method is a probabilistic account of causality; it uses observed data sets to find patterns of correlation. One good thing about time series vector autoregression (VAR) is that we could test ‘causality’ in some sense. This test is first proposed by Granger (1969), and therefore we refer to it as the Granger causality.

Simple Mechanism to define Granger Causality:

It is based on the idea that if X causes Y, then the forecast of Y based on previous values of Y AND the previous values of X should best result in the forecast of Y based on previous values of Y alone.

Granger causality should not be used to test if a lag of Y causes Y. Instead, it is generally used on exogenous (not Y lag) variables only. In simple terms ‘X is said to Granger-cause Y if Y can be better predicted using the histories of both X and Y than it can by using the history of Y alone’

When performing Granger Causality Test we need to consider two assumptions:

  1. Future values cannot cause the past values.
  2. A notably distinct information is contained in **cause **about effect which will not be available elsewhere

_Python implementation of __statsmodel _package for the Granger test.

Lets do the code:

import pandas as pd
import numpy as np
import matplotlib 
import seaborn as sns
import random
import matplotlib.pyplot as plt
from dateutil.parser import parse
from scipy import signal
from scipy.interpolate import interp1d
from scipy import stats
from statsmodels.tsa.stattools import adfuller, kpss, acf, pacf, grangercausalitytests
from statsmodels.nonparametric.smoothers_lowess import lowess
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from sklearn.metrics import mean_squared_error
%matplotlib inline
## Upload Data in a DataFrame
df = pd.read_csv('IxR_Data.csv', parse_dates=['Value_Date'])
df.set_index(['Value_Date'])
## Get few records
df.head()

Image for post

Image for post

We will do some EDA …. Not relevent at this to show here. But Just wanted to show you that how the series are behaving in data time series

ax = sns.lineplot(x="Value_Date", y="IxD", data=df)
ax1 = sns.lineplot(x="Value_Date", y='RxB', data=df)

Image for post

Image for post

We can see from the above chart that both _Time Series _move (_not so) _simultaneously, and lead to the inter relationship can be use for the prediction of any variable.

#var #model #granger #granger-causality #causality

Annalise  Hyatt

Annalise Hyatt

1598705520

Microsoft’s DoWhy is a Cool Framework for Causal Inference

I recently started a new newsletter focus on AI education. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

The human mind has a remarkable ability to associate causes with a specific event. From the outcome of an election to an object dropping on the floor, we are constantly associating chains of events that cause a specific effect. Neuropsychology refers to this cognitive ability as causal reasoning. Computer science and economics study a specific form of causal reasoning known as causal inference which focuses on exploring relationships between two observed variables. Over the years, machine learning has produced many methods for causal inference but they remain mostly difficult to use in mainstream applications. Recently, Microsoft Research open sourced DoWhy, a framework for causal thinking and analysis.

The challenge with causal inference is not that is a new discipline, quite the opposite, but that the current methods represent a very small and simplistic version of causal reasoning. Most models that try to connect causes such as linear regression rely on empirical analysis that makes some assumption about the data. Pure causal inference relies on counterfactual analysis which is a closer representation to how humans make decisions. Imagine a scenario in which you are traveling with your families for vacations to an unknown destination. Before and after the vacation you are wrestling with a few counterfactual questions:

Image for post

Answering these questions is the focus of causal inference. Unlike supervised learning, causal inference depends on estimation of unobserved quantities. This if often known as the “fundamental problem” of causal inference which implies that a model never has a purely objective evaluation through a held-out test set. In our vacation example, you can either observe the effects on going on vacation or not going on vacations but never both. This challenge forces causal inference to make critical assumptions about the data generation process. Traditional machine learning frameworks for causal inference try to take shortcuts around the “fundamental problem” resulting on a very frustrating experience for data scientists and developers.

Introducing DoWhy

Microsoft’s DoWhy_ is a Python-based library for causal inference and analysis that attempts to streamline the adoption of causal reasoning in machine learning applications. Inspired by Judea Pearl’s do-calculus for causal inference, DoWhy combines several causal inference methods under a simple programming model that removes many of the complexities of traditional approaches. Compared to its predecessors, DoWhy makes three key contributions to the implementation of causal inference models._

  1. Provides a principled way of modeling a given problem as a causal graph so that all assumptions explicit.
  2. Provides a unified interface for many popular causal inference methods, combining the two major frameworks of graphical models and potential outcomes.
  3. Automatically tests for the validity of assumptions if possible and assesses robustness of the estimate to violations.

#2020 aug tutorials # overviews #causality #inference #machine learning #microsoft