There are several platforms and programming languages for data science and machine learning project implementation (**see image above**).
There are several platforms and programming languages for data science and machine learning project implementation (see image above). Even though Python and R are considered the top two programming languages for data science and machine learning, the fundamental skill in data science is mastering the logical reasoning and flow of the process, not the programming language used.
In this article, we consider the workflow for two data science projects that demonstrate the logical reasoning and flow of the process, irrespective of the programming language employed for implementation.
A data visualization workflow typically consists of the following components. This workflow can be implemented using any programming language such as R, Python, Matlab, C++, etc.
Typical workflow for a data visualization project. Image by Benjamin O. Tayo
a) Data Component: An important first step in deciding how to visualize data is to know what type of data it is, e.g. categorical data, discrete data, continuous data, time-series data, etc.
b) Geometric Component: Here is where you decide what kind of visualization is suitable for your data, e.g. scatter plot, line graphs, bar plots, histograms, Q-Q plots, smooth densities, boxplots, pair plots, heatmaps, etc.
c) Mapping Component: Here you need to decide what variable to use as your x-variable (independent or predictor variable) _and what to use as your _y-variable (dependent or target variable). This is important especially when your dataset is multi-dimensional with several features.
d) Scale Component: Here you decide what kind of scales to use, e.g. linear scale, log scale, etc.
e) Labels Component: This includes things like axes labels, titles, legends, font size to use, etc.
f) Ethical Component: Here, you want to make sure your visualization tells the true story. You need to be aware of your actions when cleaning, summarizing, manipulating, and producing a data visualization and ensure you aren’t using your visualization to mislead or manipulate your audience.
An example of a data visualization workflow with Python and R implementations can be found here:
A machine learning workflow would consist of the following steps that are independent of the programming language used for implementation.
Typical workflow for a machine learning project. Image by Benjamin O. Tayo
Data Analysis Using Excel - Learn useful Excel techniques and create powerful dashboard for exploratory data analysis
Data science is omnipresent to advanced statistical and machine learning methods. For whatever length of time that there is data to analyse, the need to investigate is obvious.
Tableau Data Analysis Tips and Tricks. Master the one of the most powerful data analytics tool with some handy shortcut and tricks.
Analysis, Price Modeling and Prediction: AirBnB Data for Seattle. A detailed overview of AirBnB’s Seattle data analysis using Data Engineering & Machine Learning techniques.
DISCLAIMER: absolutely subjective point of view, for the official definition check out vocabularies or Wikipedia. And come on, you wouldn’t read an entire article just to get the definition.