Introduction

In this article, we will apply causal inference techniques to a dataset collected for the Infant Health and Development Program (IHDP). Researchers instructed trained personnel to provide comprehensive, high-quality childcare to remedy problems of low-birth-weight, premature infants. By providing this early intervention, researchers hoped that there would be a causal effect on children’s cognitive test scores.

The goal of this article will be to estimate this causal effect, thus determining whether this intervention had a causal effect on children’s cognitive test scores.

Technical Overview

This dataset does not represent a randomized controlled trial in which treatments were randomly assigned, so there may be confounders between the treatment and outcome. Fortunately, the dataset provides us with 26 features, which may be potential confounders. We will assume that these are the only confounders in the experiment for the sake of this application.

The dataset comprises of a column X (the treatment variable), a column Y (the outcome), and columns Z-0, Z-1, …, Z-25 (feature variables). X is a binary random variable that is labeled 1 if the patient was treated, and 0 otherwise. Y is a continuous random variable the indicates the children’s cognitive test score. Z-0, Z-1, …, Z-25 are a mix continuous and binary random variables that may confound X and Y, and indicate certain features such as the mother’s age, whether or not she smokes, etc.

#data-science #causality #statistics #data analysis

An Application of Causal Inference
2.00 GEEK