Data Analysis is a Prerequisite for Data Science. Here’s Why.

With a changing landscape in the workforce, many people are either changing their careers or applying to different companies after being laid off. Some of those people are data analysts who want to become data scientists, or some of them are data scientists who are changing companies and will need to focus some of their time on learning or refreshing their data analytics knowledge.

You could assume that data analysis is already taught in data science programs, but in my experience, I have seen an immediate jump to data science, or more specifically, machine learning algorithms. Just like how it is easy to assume now that a data scientist would already know data analytics, so do some universities or online courses that jump right into the meat of common machine learning concepts. This assumption can lead to some data scientists struggling in data analytics. Although it seems that it might be more simple at first, data analytics is the foundation of data science. You must understand your business, your data, and your metrics. This information is ultimately what feeds your statistical methods and data science models.

Below, I will summarize data analytics and data science, and give some examples of why data analytics is so important to data science.

Data Analysis

Data analytics is oftentimes referred to as business intelligence, BI development, or product analytics. This field is found at nearly every tech business, and most other businesses as well. It is essential to practice data analytics at a company to ensure visibility of company finances, customer data, and areas for improvement where a future machine learning model could be applied. Data analytics can be found using tools like:

Tableau, Looker, Google Data Studio, SQL, Excel, and sometimes Python.

Examples of data analytics can be:

  • finding the top users of a product
  • gathering key demographics of those users
  • summary metrics for a product
  • finding seasonal trends in user behavior
  • highlighting anomalies in financials

As you can see from the above examples, all can be applied to the data science process in some way. These examples can also serve as features or attributes that will be inputted into your model.

Data Science

Data science is becoming more and more popular as a career path for many people to take. It is essentially a career where you automate otherwise manuals processes with the use of programming languages and statistics. Its foundation is based on data analytics and mathematics. Common tools that data scientists can expect to use are:

Data Analysis, SQL, Tableau, Python, R, SAS, Terminal, Jupyter Notebook, AWS, GCP, sklearn and TensorFlow libraries (as well as many more).

There are several parts of the overall process in data science that can include data analysis, such as data formation/creation, data cleansing, exploratory data analysis (especially this part), feature engineering, and interpretation of suggestions/predictions/results.

Examples of data science can be:

  • clustering for customer behavior
  • categorization of clothing styles
  • face image detection and categorization
  • a chatbot utilizing natural language processing (NLP)
  • automatic anomaly detection for fraud

Most, if not all, of these examples, are based on data analysis firsthand, as well as after the data science process.

