I. Introduction

Data plays a central role in data science and machine learning. Most often, we assume that the data to be used for analysis or model building is readily available and free. Sometimes we may not have the data and getting the full dataset either isn’t possible or would take too long to collect. In this case, we need to design a way to try to collect the best subset of data that we can get quickly and efficiently. The process of designing an experiment for collecting data is called the design of experiments. Some examples of the design of experiments include surveys and clinical trials.

In this article, we will discuss 4 main factors to keep in mind when designing and executing experiments for data collection.

Design of Experiments in Data Science
