The Data Science Process

Let's suppose that you've been given a data problem to solve.

Let’s suppose that you’ve been given a data problem to solve and you’re expected to produce unique insights from the data given to you. So the question is, what do you exactly do to transform a data problem through to completion and generate data-driven insights? And most importantly of all, Where do you start?

Let’s use some analogy here, in the construction of a house or building the guiding piece of information used is the blueprint. So what sorts of information are contained within these blueprints? Information pertaining to the building infrastructure, the layout and exact dimensions of each room, the location of water pipes and electrical wires, etc.

Continuing from where we left off earlier, so where do we start when given a data problem? That is where the Data Science Process comes in. As will be discussed in the forthcoming sections of this article, the data science process provides a systematic approach for tackling a data problem. By following through on these recommended guidelines, you will be able to make use of a tried-and-true workflow in approaching data science projects. So without further ado, let’s get started!

Data Science Life Cycle

The data science life cycle is essentially comprised of data collection, data cleaning, exploratory data analysis, model building and model deployment. For more information, please check out the excellent video by Ken Jee on the Different Data Science Roles Explained (by a Data Scientist). A summary infographic of this life cycle is shown below:

