Introduction

In this article, I will show how missing values can lead to biased estimates by working through a common dataset in 4 scenarios where the missing value mechanism differs. The content in this article is based on Chapter 15 in [1].The code to reproduce the results in this article can be found in this notebook.I assume the reader is familiar with building generalized linear models (GLMs) and using directed acyclic graphs (DAGs) to illustrate causality.

Dataset

Suppose we want to study the effect of student diligence and homework quality. Let’s imagine that we were somehow able to assign a real number to measure a student’s diligence and that homework quality is measured on a 10-point scale, 0 to 10.In other words, our dataset looks like this:

Image for post

Figure 1: Sample synthetic dataset

A student’s student score is just a random variable sampled from a standard normal distribution.

#statistical-inference #data-science #mathematics #statistics #bayesian-statistics

Bayesian Inference: How Missing Values Causes Biased Estimates
1.40 GEEK